29. toukokuuta 2020

FSD’s Kuha2 software improves discoverability of European research data

Kuha2 is a collection of applications intended for sharing descriptive social science metadata. It is targeted at cultural memory and research organisations that aim to make their data descriptions accessible for other parties in a machine-readable format by using automatic collection of data, in other words harvesting. This improves the visibility and findability of research data, strengthens cooperation between organisations and enhances the practices of data description.

The Kuha2 software consists of multiple server applications and a client application, and it supports OAI-PMH and OSMH application programming interfaces (APIs). The development of the Kuha2 software began in early 2017 as part of the CESSDA SaW project. The aim of the project was to support new and aspiring CESSDA archives in finding, utilising and developing technical solutions. At the end of 2017, Kuha2 was published as open-source software. The use and expansion of this first version that was suitable for production use was extensively documented. Kuha2 adhered to the aims of the SaW project in that no extensive technical knowledge was required to be able to use the software. Active development of the software is still in progress. In addition to software maintenance, new functional features are continuously added. For example, support for EAD3 format was released in January 2020.

Active development establishes the conditions for extensive use

FSD developed the original Kuha in 2014 for the purpose of transferring descriptive metadata to Finna, a search service that provides information from Finnish archives, libraries and museums, by using OAI-PMH protocol. In 2016, FSD developed the Omicrops server application that utilises OSMH protocol. Kuha2 was born from the idea to create one unified background service that would provide content for the needs of both APIs. The unified entity was dismantled into smaller parts by adapting microservice architecture. Each task became an independent process which communicates with other processes by using standardised APIs.

In 2018, CESSDA launched the CESSDA Data Catalogue (CDC) search service that aims to include descriptive metadata from as many CESSDA service providers as possible. It scrapes data from service providers using the OAI-PMH API. In order to integrate to CDC, the service provider needs a data description format compatible with the search service as well as an open OAI-PMH API that serves descriptive metadata for harvesting. Kuha2 supports both the OAI-PMH protocol and the DDI2 format used by CDC. Close cooperation between FSD and CESSDA ensured the compatibility of Kuha2 with the CESSDA Data Catalogue.

CESSDA harvests metadata provided by service providers to its open Data Catalogue via Kuha2-APIs.

The DDI standard is used in social science data archives for data description. In common practice, three different versions of the standard are in use. To benefit a larger group of organisations, Kuha2's import functionality was enhanced to support the DDI1 and DDI3 versions in addition to DDI2, which is used by FSD. Due to extensive support for DDI and the traction gained by CESSDA Data Catalogue, Kuha2 is in use outside FSD as well. To our knowledge, three other CESSDA archives utilise the Kuha2 software in their organisations. FSD has offered them support in getting started and customised the client application so that the users' descriptive metadata are interpreted correctly by the software. As the use of Kuha2 has become more common, CESSDA has gained new organisations as service providers for the CDC.

For its users, Kuha2 provides an easy-to-use, open, documented and production-ready software to facilitate joining the CESSDA search service or other corresponding services that utilise harvesting. With the expansion of use, error reports and requests for new features come in from outside our own organisation as well. FSD is prepared to receive and review source code changes, and possibly merge them as part of the software.

The principles of openness in FSD operation

Cooperation with different actors advances the discoverability and use of research data. Open APIs make descriptive metadata accessible to anyone and promotes reuse for new purposes. Open-source software allows its users to detach themselves from dependencies on technological solutions and product suppliers. It also improves the level of data security, makes quality assurance easier, and brings ideas and applications accessible for everyone. Using the aforementioned tools, FSD operates in an international field, advancing the development of technological solutions and open science.

More information:

» Kuha2 Documentation on Read the Docs

Toni Sissala
Software Developer
firstname.surename [at] tuni.fi

1OSMH (Open Source Metadata Harvester) is a harvesting protocol developed by CESSDA.

This blog entry is also available in Finnish:
Tietoarkiston kehittämä Kuha2 edistää eurooppalaisten aineistojen löydettävyyttä.

Ei kommentteja: