Scalable genomics tools, powered by Apache Flink

JEENK

Scalable genomics tools, powered by Apache Flink

Contacts

Challenge

The rapid advancement of DNA and RNA sequencing technologies generates an exponential increase in the data stream to be processed by sequencing centers. New large-scale applications are enabled by the falling cost of data acquisition, but hindered by the use of conventional computational techniques used to process the data.

Overview

Jeenk is a collection of parallel, distributed tools for genomics, that introduce the distributed stream computing approach to large-scale genomics data analysis. Jeenk is based on the Apache Flink data streaming framework and uses Apache Kafka for data movement.

It consists of three Flink-based tools that implement a full raw-to-CRAM pipeline for Illumina data:

  • A reader, that reads the proprietary raw Illumina BCL files directly from the sequencer's run directory and converts them to read-based data (FASTQ-like), which are sent to a Kafka broker for storage and further processing (akin to Illumina's bcl2fastq2);
  • An aligner, that aligns the reads to a reference genome using the BWA-MEM plugin through the RAPI library (http://github.com/crs4/rapi/);
  • A CRAM writer, that writes the aligned reads as space-efficient CRAM files.

Innovative features

  • ultra-scalable state-of-the-art distributed stream processing technology;
  • reduced turnaround times.

Potential users

Bioinformatics researchers, sequencing centers professionals

Impact sectors

Biotechnologies

Other resources

  1. https://github.com/crs4/Jeenk
  2. F. Versaci, L. Pireddu, G. Zanetti, "Scalable genomics: From raw data to aligned reads on Apache YARN", Proc. IEEE Int. Conf. Big Data (Big Data), pp. 1232-1241, Dec. 2016.
  3. F. Versaci, L. Pireddu, G. Zanetti, Proc. IEEE EMBS Int. Conf. on Biomedical & Health Informatics (BHI), Vol. 2018, pp. 259-262, 2018

Questo sito utilizza cookie tecnici e assimilati. Possono essere presenti anche cookie profilazione di terze parti. Se vuoi saperne di piĆ¹ o negare il consenso a tutti o ad alcuni cookie leggi l'informativa completa. Proseguendo nella navigazione (anche con il semplice scrolling) acconsenti all'uso dei cookie. This site uses technical and anonymized analytics cookies only. There may also be profiling third-party cookies. Please read the cookie information page to learn more about how we use cookies or blocking them. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close