a suite of distributed software for HT sequencing


Luca Pireddu, Simone Leo, Gianluigi Zanetti. E-mail:


The rapid advancement of DNA and RNA sequencing technologies generates an exponential increase in the data stream to be processed by sequencing centers. However, computer processing, storage and network capacity does not grow proportionally. The difference in growth rates of the two technologies requires the adoption of distributed computing techniques to be able to scale the data processing operations with the growth of the sequencing operations that produce them.


Seal is a suite of applications for processing of sequencing data that stands out for its scalability, allowing to expand the processing capacity by simply increasing the number of computers available while maintaining low operating costs. Seal applications, in fact, work in a distributed manner based on the Hadoop framework, applying to the DNA analysis the same IT principles that allow to Google, Facebook and eBay to process huge volumes of data. Seal currently consists of four main applications that allow you to: perform read demultiplexing; align the read to a reference genome; identify duplicate PCR; order the reads; recalibrate empirically the quality of sequenced bases.

Innovative traits

  • highly scalable according to the number of computing nodes and the size of the input data;

  • resilient to transient problems of the computer center, thanks to the Hadoop framework on which it is based;

  • easy monitoring via Web of processing and activities in progress.

Potential users

Researchers bioinformatics, sequencing centers professionals.

Impact sectors

Research centers - Universities - hospitals - Biotechnology industry.

Other resources

  2. Luca Pireddu, Simone Leo, and Gianluigi Zanetti. Mapreducing a genomic sequencing workflow. In Proceedings of the second international workshop on MapReduce and its applications, MapReduce '11, pages 67–74, New York, NY, USA, 2011.
  3. Luca Pireddu, Simone Leo, and Gianluigi Zanetti. Seal: a distributed short read mapping and duplicate removal tool. Bioinformatics, 27(15):2159–2160, 2011.

Questo sito utilizza cookie tecnici e assimilati. Possono essere presenti anche cookie profilazione di terze parti. Se vuoi saperne di più o negare il consenso a tutti o ad alcuni cookie leggi l'informativa completa. Proseguendo nella navigazione (anche con il semplice scrolling) acconsenti all'uso dei cookie. This site uses technical and anonymized analytics cookies only. There may also be profiling third-party cookies. Please read the cookie information page to learn more about how we use cookies or blocking them. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.