Sally Wyatt about making data FAIR

This week, our Dean of Research, Sally Wyatt, tells us about the importance of making research data FAIR, the challenges we may face when making research data FAIR and how she has made her research data FAIR.

  1. Why is it important to make research data FAIR? Why do you encourage colleagues to make their data FAIR?

It has always been important for researchers to make their data available, and in that sense there is nothing new with FAIR data. Openness and transparency have long been ideals for researchers. Making one’s data available to others is how we judge the validity of claims and arguments made by others, and how others judge us. It’s also a sign of personal research integrity, and important for the credibility and trustworthiness of research and science more generally. Doing research and sharing your data and results are central to our role as researchers. I sometimes think of doing research and publishing as being part of an ongoing dialogue not only with our peers living now, but also with those from the past and in the future.

Robert Merton (1942, ‘The normative structure of science’) talks about the importance of sharing and opennes. One of his four norms of science is ‘communalism’, that the results of research should be available to all, science is a collective activity. Our reward system is built on recognition in the form of reputation and esteem, built up over time. It is not built on private gain through hoarding results. Another of Merton’s norms is ‘organised scepticism’. This is the idea that scientific claims should be subject to scrutiny and that can only happen if data are shared.

  1. What are the potential challenges researchers may face when making their data FAIR? How do you recommend them overcoming these challenges?

For research conducted in universities, paid for by public funds, the default position should be that data are open to other researchers and to anyone else who might want access, including policy makers and civil society organisations. No one is really against making data open, except some private companies wanting to protect their future profits, as we regrettably now see with some covid vaccines.

But the devil, as always, is in the detail. One problem is that there is not a shared understanding of ‘data’. Some disciplines, especially in the humanities, prefer to talk of sources, many of which are already publicly available in archives and libraries so there is no point in duplicating those in other data repositories. Other disciplines, especially in the qualitative social sciences and in medicine, might have sensitive personal information. Our ethical obligation to protect respondents might hinder sharing of some data. Plus we might have built up long-term relationships of trust with respondents or interlocutors. Just as our friends might not appreciate us sharing secrets on social media, some of those long-term research partners might not appreciate details being shared that could jeopardise their own positions.

An important challenge is that the context in which data are collected is incredibly important. We know from the history and philosophy of science that reproducing experiments is really difficult, that one also needs lots of information about the instruments used, temperate of the lab, sort of mouse or plant, etc. Without that kind of information, it can be very difficult to interpret the results, much less repeat them. That’s also true of data collected by qualitative means. One needs to know what the research questions were, how these were operationalized in questionnaires or interviews or observations. Without that kind of context, it can be very difficult to reuse data collected by someone else. And this is all before we get to challenges around categorization, and how these might change over time or between contexts. It is not only about making data open, but also making available the instruments one used to collect and analyse data. And that is a lot of work.

That brings me to the final challenge – our system of recognition does not give much weight to making data open. It emphasizes publications rather than the work of sharing data. And it can be a lot of work to make data meaningful to others. The work of making data available in a form that is intelligible to others is often invisible and under-valued. That is beginning to change, with the emergence of data journals and data citation, but there is still much to be done.

Recently, one of my PhD students, Kathleen Gregory (graduated cum laude on 3 March 2021) wrote a wonderful PhD called Findable and reusable? Data discovery practices in research. She combined qualitative and quantitative methods to understand how researchers in different ‘data communities’ (which overlap with but are not identical to (sub)disciplines). She also wrote a ‘data paper’ for the journal Scientific Data to explain to others how to find and use her data. It was a very boring thing to write, she tells me, but these sorts of data papers are necessary both to help other researchers re-use data and to give researchers credit for making data available.

  1. Why does FASoS focus on the findability and accessibility of research and not so much on the interoperability and reusability?

The previous answer hints at that. Findable and accessible emphasise openness, and I prefer to talk about open data rather than FAIR data (and this is reflected in a report I helped to write for the International Science Council in 2015, called ‘Open Data in a Big Data World’. The reason is that the I and the R are technical specifications that describe data that can be read by computers, in other words, data that are machine readable. Then a researcher also needs access to good computers and networks, that we in Maastricht might take for granted (well, maybe not after the cyberattack) but that is not true everywhere. If we insist that data must be machine readable in order to be FAIR we could be disadvantaging those researchers working in poorly funded universities. This could lead to greater inequalities, or even scientific neo-colonialism if data generated by or about poorer parts of the world can only be analysed in richer countries.

By emphasising the machine readable requirement, we are also ignoring the fact that a lot of data and sources are not available in digital form. For example, in the Dutch National Library (KB), less than 1% of books published since 1960 are digitally available, and only 20% of those published between 1940-1959.

  1. How have you made your research data FAIR?

My very first academic job (between 1980-86) was at SPRU (Science Policy Research Unit, University of Sussex), as an incredibly young and junior research assistant for Luc Soete (former rector of Maastricht University). We were working on developing indicators to measure technological competitiveness. Some of that involved publicly available data from patent offices for example. But we also drew on Project SAPPHO, a database SPRU colleagues had started in 1968 of both successful and failed innovations. It was full of amazing data, collected by many researchers over two decades, data are still used and its methodology was reproduced in other countries.

Otherwise, I have tended to use secondary data already in the public domain, or deeply contextual qualitative data. With the project with Kathleen (see above), we made the data and code available via DANS (Data Archive and Networked Services, an institute of the KNAW and NWO).

  1. What would you advise colleagues at FASoS to do when they want to make their data FAIR? Where should they start?

That’s an easy question. They should contact our wonderful data steward, Maria Vivas-Romero. Maria works for FASoS for one day a week, and spends the rest of her time at the library. Luckily, Maria herself has a PhD in anthropology so understands some of our specific needs. It is not only about making your data available to others, when that’s feasible and appropriate, it’s also about making your data available to your future self. The starting point is figuring out how you are going to manage your own data, so you can easily find them at a later stage and so they don’t disappear on a single memory stick or in a box under your desk. Even before contacting Maria, check the ‘Nine golden rules for good research data management’.

Submit your comment

Please enter your name

Your name is required

Please enter a valid email address

An email address is required

Please enter your message

FASoS Weekly © 2024 All Rights Reserved

Designed by WPSHOWER

Powered by WordPress