ACDC is making Data FAIR and Safe: Our New Rsync-Based Backup Tool

In today’s research landscape, data is the lifeblood of discovery. But as datasets grow and experiments become more complex, keeping data both safe and FAIR (Findable, Accessible, Interoperable, and Reusable) is a challenge that every scientific group faces. That’s why our team has developed a new backup tool, designed specifically to make data management seamless, reliable, and transparent for scientists of all backgrounds.

At its core, our tool is a smart, Windows-based solution that leverages the power of rsync—a trusted workhorse in the world of data synchronization. But we’ve gone far beyond the basics. The tool comes in two flavors: one for those who prefer to trigger backups manually, and another for those who want everything to happen automatically in the background, even during off-hours.

What sets our approach apart is its focus on selectivity and transparency. Instead of blindly copying everything, the tool intelligently backs up only those folders that have changed recently—though customizable, by default, anything modified in the last two days. This not only saves time and storage space, but also ensures that backup operations are efficient and relevant. Each backup is meticulously logged, with a special log file placed in every folder that’s been updated. This means you always know what’s been backed up and when, making it easy to track your data’s journey.

We know that scientific data often lives on network shares and needs to be accessible across different computers and instruments. Our tool handles all the heavy lifting: it authenticates to network shares, mounts volumes as needed, and even manages file permissions to ensure that your data remains accessible to those who need it, while staying secure.

Automation is a key feature, especially for busy labs. The automated version of our tool can be scheduled to run at any time using Windows Task Scheduler. It even offers a graceful shutdown option: after a backup completes, the system can automatically power down, saving energy and reducing wear on lab computers. And if you’re still working, you can easily cancel the shutdown with a click.

We’ve also built in robust error handling and comprehensive logging, both locally and remotely. If something goes wrong, dedicated error logs make troubleshooting straightforward. All logs are organized by computer and date, and aggregated in a central location, supporting the transparency and traceability that FAIR data principles demand.

Updates are handled automatically via GitHub, so you’re always running the latest version without any manual intervention. Configuration is simple and flexible, using a human-readable YAML file to define everything from source and destination directories to network settings and log locations.

Under the hood, the tool is written in Python and relies on proven technologies like rsync and Cygwin for file operations, along with a suite of Python libraries for everything from logging to network integration.

In short, our backup tool is more than just a safety net—it’s a step toward making scientific data management smarter, more efficient, and fully aligned with the FAIR principles. Whether you’re a computational biologist, a physicist, or a data steward, this tool is designed to fit seamlessly into your workflow, giving you peace of mind and more time to focus on what really matters: your research.

If you are interested in this tool, please do not hesitate in contacting us