Nine researchers from the CRS4 Visual and Data-intensive Computing group contributed to the design and development of a digital research resource integrating clinical, genomic, and pathology imaging data from 26 biobanks across 12 European countries.
An article describing the Colorectal Cancer Cohort (CRC-Cohort) dataset has been published in Nature Scientific Data. The dataset is the result of several years of research and development carried out within the European biobanking infrastructure BBMRI-ERIC. It integrates clinical, genomic, and digital pathology images (whole-slide images) collected from 26 biobanks in 12 European countries, contributing to the collection and harmonization of data from over 10,000 patients diagnosed with colorectal cancer.
Colorectal cancer is one of the most common and deadly cancers worldwide, with increasing incidence and mortality rates. This makes it a major global health challenge, requiring new strategies for prevention, early diagnosis, and targeted, personalized therapies. Since this type of cancer develops slowly from pre-malignant lesions, early detection through screening and surveillance is crucial to identify it at early stages and improve patient outcomes. The cohort represents a key resource for advancing research on analytical methods, biomarkers, precision medicine, and epidemiological studies.
The work described in the article is the result of a large international scientific collaboration involving dozens of research institutions and more than sixty academic and scientific affiliations across Europe. At CRS4, nine researchers from the Visual and Data-intensive Computing group – Francesca Frexia, Cecilia Mascia, Alessandro Sulis, Giovanni Delussu, Mauro Del Rio, Vittorio Meloni, Luca Pireddu, Simone Leo e Marco Enrico Piras – contributed to different components of the digital infrastructure.
The CRS4 team contributed to the study and definition of models and tools for integrating heterogeneous data from multiple centers in a harmonized and interoperable way. This work enabled the development of a state-of-the-art data model aligned with the FAIR principles, which is essential for the future inclusion of the dataset in the European Health Data Space (EHDS). In particular, the CRS4 group contributed to the transformation and management of clinical data according to openEHR specifications, as well as to the development of interoperability mechanisms based on international standards such as OMOP and HL7 FHIR, enabling integration with European federated platforms for biomedical research.
CRS4 also contributed to the processing of digital histopathology images, including their conversion into the open OME-TIFF format—improving data accessibility and removing vendor lock-in—and the application of FAIR principles to processing workflows to enhance data traceability and reproducibility.
This work builds on long-standing collaborations between the CRS4 Visual Computing group, BBMRI-ERIC and its Italian node BBMRI.it, as well as on the activities of European research centers involved in developing digital infrastructures and biobanks for sharing biomedical data. It has also been supported by several research projects, including EOSC-Life (Horizon 2020, GA 824087) and XDATA, funded by the Autonomous Region of Sardinia.
The full article is available on Nature Scientific Data:
https://doi.org/10.1038/s41597-026-06822-2