Documenting and making sense of digital research processes: findings from an international survey of archaeologists

Submitted by Isto Huvila on Wed, 12/15/2021 - 16:17

Presentation together with Olle Sköld and Lisa Börjesson at the Digital Humanities in the Nordic and Baltic Countries (DHNB) 2022 conference organised in Uppsala, Sweden on the findings from a survey study conducted as a part of the CAPTURE project.


Data-intensive research in digital humanities disciplines requires a thorough understanding of the data used in the research. Earlier surveys of researchers representing several humanities and social science disciplines have repeatedly reported the importance of understanding and conveying the context of research data as a key antecedent of its (re)usability in secondary work. Besides publications and findings, research data is increasingly seen as an outcome that makes a difference, extends the impact of research beyond the first primary findings and observations, and effectively can put research in action long time after it was first conducted. A part of the context a researcher needs to understand that has not been studied to a considerable extent so far, is how the data was created and how it has been manipulated.

The aim of this presentation is to report preliminary findings of an international survey of archaeologists conducted in 2021 on what information archaeologists who are or have been using different types of data need to know about the data to (re)use it effectively, and what archaeologists who have published data for (re)use consider it to be important for others to know about their data. The focus of the presentation is on findings relating to needs regarding knowledge about data creation and processing.

The data was collected using a web survey directed to archaeologists who had been creating and/or using data – understood in a broad sense – in their research. The study is a part of the ERC-funded research project CAPTURE on the documentation of data production and use. The survey was distributed using a wide range of primarily online channels and personal contacts. The analysed data consist of (N=) 90 responses. The statistical analyses reported in the presentation were conducted in R 4.0.3 and using qualitative open-ended coding in NVivo 1.5.

The analysis shows that the respondents considered contextual and processual information––paradata––essential for successful data (re)use. Paradata is used for understanding multiple aspects of the data but the exact information needed depends on the type of data and how it is used. On the basis of the findings, the presentation discusses feasible strategies to capture and produce relevant paradata for diverse user needs.