Digital Humanities and data capture

The goal of the Project is to facilitate the publication of existing data, metadata and contextual information from research work discovered during the Seeding the Commons project and other data audits.

By identifying and assisting in the publication of these datasets the Division of Information hopes to encourage the development of a culture of dataset publication within these disciplines.

The complexities of the generation of structured datasets encoded with standard metadata may be a block to the publication of datasets within the traditionally less numerate disciplines. The Division of Information seeks therefore to build on the work of ANU Seeding the Commons project by modifying the data capture workflows to ensure the production of good quality metadata encoded according to standard schemata as an aid to data interchange and reuse.

Data capture is normally focussed on the data intensive scientific disciplines. However the ANU has a growing involvement in the Digital Humanities and in other related disciplines.

Increasingly among the Humanities and Social Sciences researchers amass large collections of digital material such as images, video recordings, text based materials, and sound recordings.

In many cases the linkages between these items are as important as the items themselves – for example, the documentation of an aboriginal ceremony might legitimately include video recordings of the participants, photographs of costume, recording of both spoken language and song, and digitised copies of reports by early explorers and anthropologists describing the ceremony as they experienced it.

It is also the case that researchers in the Humanities and Social Sciences are not noted for publishing collections of their research material. This is even more true for the creation and publication of linkages between individual items in their research collection

The aim of this project is to facilitate the publication of existing data, metadata and contextual information from research work discovered during the Seeding the Commons project and other data audits, as well as for new data being generated during ongoing and new projects.

It is recognised that the majority of these small datasets is located within the College of Arts and Social Science and the College of Asia and the Pacific.

By identifying and assisting in the publication of these datasets the Division of Information hopes to encourage the development of a culture of dataset publication within these disciplines.

The complexities of the generation of structured datasets encoded with standard metadata may be a block to the publication of datasets within the traditionally less numerate disciplines. The Division of Information seeks therefore to build on the work of ANU Seeding the Commons project by modifying the data capture workflows to ensure the production of good quality metadata encoded according to standard schemata as an aid to data interchange and reuse. The ‘capture’ in this case is from an inaccessible form with poor quality metadata to an accessible form with good quality metadata. The creation of software infrastructure to effect this transformation is what will be funded. This software infrastructure will be used both for legacy data sets (fixing the past) and for the creation of new data sets (fixing the future). The software infrastructure will support adding links /managing metadata for pre-existing digital data, and act as a standard data management and collection assembly tool to be used to accompany the creation of new digital data.

This work is also allied to the ANU’s involvement in the workspace and collections interoperability strands of Project Bamboo (www.projectbamboo.org).