This hard drive was unable to withstand the ravages of time. But were the data on it saved before it was too late? | Photo: Carl Ander / Connected Archives

In the spring of 2025, the National Oceanic and Atmospheric Administration (NOAA) lost about 10 percent of its staff, i.e., some one thousand employees. The Trump administration has promised to cut its budget by 25 percent in 2026 and has threatened to cancel contracts paying for data hosting.

It was enough to cause a wave of anxiety throughout the world of environmental research, even more than during Donald Trump’s first presidency. What if these precious resources were to disappear? Many institutions and individuals quickly mobilised to make copies. The Data Rescue Project, launched in February 2025, coordinates these efforts by listing them with endangered websites and databases.

The attacks of time

The actions of the United States government against science are brazen. But other less talked-about dangers threaten the sustainability of research data, says Jürgen Enge, the head of IT at the Basel University Library. First of all, there is time, which degrades the physical media holding digital content. For example, the tiny magnetic regions that store bits on a hard drive can lose their magnetisation and hence the data is corrupted.

Digital media can be damaged by fire or a flood, or end up in the debris after an earthquake. To prevent these risks, data warehouses host copies in other institutions located elsewhere, such as the Swiss foundation Switch. This is known as diversifying risks, i.e., not putting all one’s eggs in the same basket.

“Valuable content is duplicated on numerous, high-quality storage media”.Jürgen Enge

Maintaining and securing such archives is now an essential task of university libraries. They no longer only host books and scientific journals, but support scientists in the storage and direct backup of data from their research. “Our system automatically manages the number and type of backup copies”, says Enge. “Valuable content is duplicated on numerous, high-quality storage media. For information that could be reproduced, e.g., a scanned book, backup is rarer and more inexpensive”. The system therefore finds a compromise between safety and costs.

Preserving evidence of crimes

Since 2015, the peace research institution Swisspeace has been coordinating Safe Havens for Archives at Risk. This international initiative shelters archives that document violations of human rights or humanitarian law when they are threatened by natural disasters, armed conflicts or political interference.

The second threat is the obsolescence of file formats and storage techniques, a situation that is familiar to the public who listen to music, which has moved from vinyls and cassettes to CDs and mp3s. For example, the format of the data generated by high-tech microscopes is proprietary, and reading them often requires specific programs that may need to be updated. It is therefore necessary to transfer the archives regularly to open and more modern media – about every five years, according to Enge.

Maintaining is not sexy

But the biggest danger is probably that of funding, says Sabina Leonelli at the Technical University of Munich, which studies open science and the impact of digitisation on scientific practises: “There is no viable business model for research data infrastructure. And the amount of information generated by research is increasing exponentially, with, for example, cheap genetic sequencing in biomedical science or satellite measurements in environmental research”.

There is clearly a lack of long-term investment, says Leonelli sadly. “We are more willing to finance new research than tools to preserve the results. A government that has financed the construction of a bridge is less motivated to finance its maintenance two decades later, as it adds less value. This is even more marked in science and in the digital realm, neither of which are very visible”.

“The use of AI in research in turn requires very well-maintained infrastructure”.Sabina Leonelli

Some leaders express the hope that AI will solve everything, including facilitating the curation of databases, says Leonelli. “But that’s not what we observe on the ground. On the contrary, the use of AI in research in turn requires very well-maintained infrastructure”.

Leonelli regrets that the support needed to maintain data from projects disappears as soon as they come to an end. It is up to others – libraries or repositories specific to a discipline – to then finance the hosting of the data. And this is while the exponential decrease in storage costs is coming to an end, says Enge: “Until now, it compensated for the growth in the amount of content, but this is no longer the case”. Without the imminent arrival of new, low-cost technologies, costs are likely to explode.

Know-how threatened

In addition, doctoral students and postdocs who have produced the results and know how to use them often leave the team shortly after the end of the project, taking with them valuable knowledge. Frank Oliver Glöckner, a professor at the University of Bremen in Germany, is concerned about seeing crucial skills disappear due to the policies of the United States government. He is a specialist in earth system data science and heads Pangaea, an environmental research data platform that is currently participating in the backup of content hosted by NOAA.

“Many of these NOAA specialists have lost or left their jobs recently, and I think most will not return”.Frank Oliver Glöckner

“The work done by NOAA scientists is unique”, says Glöckner. “They consolidate the measurements made by international teams by means of different instruments, which therefore form a very heterogeneous whole. But many of these specialists have lost or left their jobs recently, and I think most will not return. Environmental sciences will suffer from the disappearance of these skills, and other people will have to learn how to do this job”.

Simply copying the contents into a file is not very complicated, but not very useful either, according to him. Because you have to be able to access it dynamically and find your way around. His team carries out this crucial curation work for NOAA content as well as for German institutions: making metadata consistent – for example, the description of each type of measurement – and integrating everything into a structured database to cross-reference different types of information.

Data and software, all included

The Swiss Renku project wants to go further than curation. This platform allows scientists to connect their data to the software used for the analyses and to a computing environment to run them.

“Having to install software before being able to use content is a barrier for many scientists”, says Rok Roskar, the lead of the Renku project at the Swiss Data Science Centre, an initiative of ETH, EPFL and PSI. By providing a complete ready-to-use package, the platform makes it possible to run algorithms and therefore to reproduce and validate published results. It also encourages the integration of these resources into a new research project in another discipline.

“Having to install software before being able to use content is a barrier for many scientists”.Rok Roskar

The goal is to promote and facilitate the reuse of results, one of the central objectives of the open research data movement. It is to avoid the risk that content hosted in repositories “ends up dying in oblivion”.

The platform establishes an interface with the servers of the institutions hosting the data, says Roskar: “This is the somewhat political aspect of my work. It is crucial for Renku’s sustainability that all partners are committed in the long term”.