a suite of distributed software for HT sequencing
The rapid advancement of DNA and RNA sequencing technologies generates an exponential increase in the data stream to be processed by sequencing centers. However, computer processing, storage and network capacity does not grow proportionally. The difference in growth rates of the two technologies requires the adoption of distributed computing techniques to be able to scale the data processing operations with the growth of the sequencing operations that produce them.
Seal is a suite of applications for processing of sequencing data that stands out for its scalability, allowing to expand the processing capacity by simply increasing the number of computers available while maintaining low operating costs. Seal applications, in fact, work in a distributed manner based on the Hadoop framework, applying to the DNA analysis the same IT principles that allow to Google, Facebook and eBay to process huge volumes of data. Seal currently consists of four main applications that allow you to: perform read demultiplexing; align the read to a reference genome; identify duplicate PCR; order the reads; recalibrate empirically the quality of sequenced bases.
- highly scalable according to the number of computing nodes and the size of the input data;
- resilient to transient problems of the computer center, thanks to the Hadoop framework on which it is based;
- easy monitoring via Web of processing and activities in progress.
Researchers bioinformatics, sequencing centers professionals.
Research centers - Universities - hospitals - Biotechnology industry.
- Luca Pireddu, Simone Leo, and Gianluigi Zanetti. Mapreducing a genomic sequencing workflow. In Proceedings of the second international workshop on MapReduce and its applications, MapReduce '11, pages 67–74, New York, NY, USA, 2011.
- Luca Pireddu, Simone Leo, and Gianluigi Zanetti. Seal: a distributed short read mapping and duplicate removal tool. Bioinformatics, 27(15):2159–2160, 2011.