Large scale biomedical computation on OMERO
The number of domains affected by the big data phenomenon is constantly increasing, both in science and industry, with high-throughput DNA sequencers being among the most massive data producers. Building analysis frameworks that can keep up with such a high production rate, however, is only part of the problem: current challenges include dealing with articulated data repositories where objects are connected by multiple relationships, managing complex processing pipelines where each step depends on a large number of configuration parameters and ensuring reproducibility, error control and usability by non technical staff.
OMERO.biobank is a robust, extensible and scalable traceability framework developed to support large-scale experiments in data-intensive biology. The data management system is built on top of the core services of OME Remote Objects (OMERO), an open source software platform that includes a number of storage mechanisms, remoting middleware, an API and client applications.
- OMERO.biobank’s kernel is complemented with an indexing system that maintains a persistent version of the traceability structure by mapping entities to nodes and actions to edges in a graph database;
- it is implemented with Neo4j;
- the indexing system can manage a large number of items.
Health - ICT
- G. Cuccuru, S. Leo, L. Lianas, M. Muggiri, A. Pinna, L. Pireddu, P. Uva, A. Angius, G. Fotia, G. Zanetti, "An automated infrastructure to support high-throughput bioinformatics", 2014 International Conference on High Performance Computing Simulation (HPCS), pp. 600-607, July 2014.
- G. Cuccuru, P. Uva, S. Onano, R. Atzeni, S. Leo, L.Lianas, Manuela Oppo, Luca Pireddu, Andrea Angius, Laura Crisponi, Gianluigi Zanetti, Giorgio Fotia, Exploiting a large scale biodata management system to support NGS variant detection studies. Poster presentation at ISMB/ECCB, 10-14 July 2015, Dublin - 2015.