In 2007, my research team, then composed of Patrizia Zavattari, Magdalena Zoledziewska, Maristella Pitzalis, Michael Whalen, Lello Murru, Becky Lewis and others, moved from the Microcitemico Hospital in Cagliari to the Polaris Technology Park in Pula.
The transfer was encouraged by pharmacologist Luca Pani, who had already been working from the Park for several years with his CNR research group.
Setting up a sequencing infrastructure for the comprehensive agnostic study of the implications of genetic variability for our health and for the origin and evolution of our species (homo sapiens sapiens) had been a goal of mine for more than 10 years and as Luca suggested we could only achieve it by establishing a large footprint at the Park.
I immediately jumped at the opportunity. Luca arranged a meeting with Giuliano Murgia and Chicco Marcheschi, respectively President and Managing Director of the Technology Park company. They both greeted my proposal enthusiastically, starting a relationship of mutual appreciation and friendship that lasts to this day.
Within a few months, we moved to the Park. Some time earlier, I had met two Cambridge researchers, Shankar Balasubramanian and David Klenerman, who had designed the Solexa platform, based on a new massively parallel nucleotide sequencing technique called ‘sequencing by synthesis’ (SBS), which formed the basis for the new generation sequencers currently most used worldwide (Solexa was acquired by Illumina in January 2007).
At the time, there was great uncertainty as to which approach and platform was best. But talking with Shankar, David and other colleagues and friends at Cambridge University - such as Roberto Cipolla, a computer vision genius of Italian ancestry who, using star-pointing technology had helped to create for Solexa the ‘CCD camera’ (which captured and converted the fluorescence signals in the millions of points where the SBS reactions took place on a small microscope slide), I became convinced that the Solexa (later Illumina) approach was the best one, so we chose it from the start.
Once there, we soon struck up a close cooperation and strategic alliance with Paolo Zanella, who was President of the CRS4 at the time, and his closest collaborators: Chris Jones, Roman Tirler and Gianluigi Zanetti. Paolo impressed me as a man with a vision, full of enthusiasm: he and his close collaborators immediately and generously decided to lend their support to our projects.
We had four non-trivial hurdles to overcome: 1) finding the funds to start the project; 2) setting up a team able to carry it forward; 3) designing the optimal procedure for processing and then analysing the sequencing data; and 4) obtaining adequate computing and storage power.
On Paolo Zanella’s suggestion, with the help of Roman Tirler, who was experienced in EU funding schemes, we started already in 2007 to write and propose various projects, mostly within the 7th Framework Programme for Research. Those first proposals were not approved, perhaps because they were ahead of their time.
But around late 2008/ early 2009, we started to obtain international and national funding, which allowed us to start the process. The sequencers were initially purchased by the regional agency Sardegna Ricerche, which entrusted their management to CRS4. A key role was played by Chris Jones: he put us in touch with leading members of the Sanger Center in Cambridge (UK) who, through the 1000 Genome project, were setting up an advanced genome sequencing centre. We started by sequencing an extensive collection of DNA from patients with autoimmune diseases and healthy controls, which our team had built in previous decades.
From May 2009, after I became director of the CNR’s Institute of Neurogenetics and Neuropharmacology (soon renamed, on my proposal, ‘Institute of Genetics and Biomedical Research’) we were also able to use for our projects an extensive collection of DNA samples and biomedical data from the SardiNIA/ProgeNIA study in the Lanusei valley; a project launched in 2001 by Giuseppe Pilia, a great geneticist and visionary, who sadly passed away too soon.
As to the work team, an important role in the DNA/RNA sequencing experiments in the new sequencing centre was played by Andrea Angius from CNR and Roberto Cusano from CRS4, with a team of technologists that included Marco Marcelli, Maria Francesca Urru and Manuela Oppo from CRS4.
Data processing and statistical data analysis were handled by a mixed team of computer scientists/statisticians from CRS4: Frederic Reinier, Riccardo Berutti, Rossano Atzeni and Ilenia Zara - recruited by Chris Jones - and researchers from my CNR team, including Serena Sanna and, soon after, Eleonora Porcu, Carlo Sidore, Mauro Pala, Maristella Steri, Giorgio Pistis and Fabrice Danjou who were mainly involved in statistical data analysis.
By developing sequencing data processing procedures and statistical analysis approaches we were able to conduct genome-wide association studies (GWAS) based on genome sequencing. The success of this initiative benefited greatly from the close collaboration with Goncalo Abecasis (from University of Michigan at Ann Arbor), who played a leading role in developing statistical tools and creating the computer analysis pipeline that corrected spurious sequence variants arising from errors in the sequencing-by-synthesis (SBS) procedure.
Goncalo was also an important mentor for many of our young analysts, including Serena, Carlo and Eleonora. Goncalo and I shared the same approach to experimental design and analysis, which consisted in performing the agnostic characterisation of the genome of tens of thousands of individuals by means of high-flow DNA chip genotyping, then carrying out whole-genome sequencing on a subset of several thousand individuals and creating a reference panel to which we applied statistical imputation approaches. This approach, which makes it possible to generate probabilistic genome sequences in all genotyped individuals at relatively low cost, is now routinely used but at that time (in late 2007/early 2008, when a project proposal based on these principles which we had submitted for EU funding was rejected) we were almost alone. But we did not give up. By December 2010 we had already sequenced the whole genome of 505 individuals (quite a feat at the time). With Goncalo we set up two sequencing centres: one in Pula and the other in Ann Arbor to secure the funds from a Stimulus programme launched by Barack Obama that could only be spent within the USA. Some members of our team, including Fabio Busonero and Andrea Maschio for the sequencing procedures and some of the analysts mentioned above, moved to the USA to carry out part of the sequencing work. Thus, the number of Sardinian individuals sequenced by the joint action of the two centres grew, with increasingly accurate data, to 1146 in September 2011, 2120 in March 2012 and 3520 in September 2013 when, based on our experimental design, we reached the target sample size of our cohorts characterised with DNA chips.
Speaking of the team, I would like to remember fondly Gianluigi Zanetti, with whom I worked closely in those years. Gianluigi was a man of superior intellect, broad culture and great integrity. He was able to address in an analytical and creative way any scientific problem, even the most complicated ones. Actually, he was irresistibly drawn to complex problems. The greater their complexity, the stronger his determination to find new solutions. As soon as he had solved one, he would look for the next challenge. Often at the end of the workday we would drive back from Pula together; I miss those moments. We would talk about everything; those chats were special and very enriching for me.
Gianluigi was also very good at choosing the people best suited to work on specific topics. While we were looking for solutions to some specific problems Gianluigi introduced me to one of his closest collaborators at CRS4, Federico Santoni, currently leader of a genetic research group in Switzerland, with whom I developed both a professional collaboration and a friendship.
CRS4 computing centre
The atmosphere among the people who worked on the project was one of great excitement. Incredibly, in just a few years a top-notch research group was established in Sardinia: ‘A new genomic Island’ as it was dubbed by an editorial in Nature Genetics.
In those years, working with Paolo Zanella, who with Chris Jones and his team were decisive in all phases of the project, we soon realised we needed to secure massive computing power for data analysis and data storage. Accordingly, Paolo ensured that a significant share of the resources allocated to the CRS4 supercomputing centre directed by Lidia Leoni would be dedicated to our projects. Lidia and her team also lent a hand enthusiastically to our initiative.
Since our computational needs had become – metaphorically speaking – ‘a bottomless pit’, Goncalo provided us with important resources from the University of Michigan’s computer centre that were vital in ensuring the completion of many projects.
Results of the work of those years
The progress in those years of research on DNA sequencing and the genetics of multifactorial diseases allowed us, through the phylogenetic analysis of Y-chromosome sequencing data, to push back the origin of homo sapiens by about 100,000 years (to some 200,000 years ago) in a paper published in Science [Francalacci P, Morelli L, [...], et al. Science].
In a study that compared the genomic DNA from the bones of a 5,300 year-old natural mummy named Ötzi, found in 1991 in a glacier in the Tyrol, with the DNA of modern individuals, we found that Ötzi’s genome was more similar to that of Sardinians than to any other population. We also demonstrated for the first time that this similarity extended to prehistoric DNA samples from cultural contexts associated with the spread of agriculture during the Neolithic transition. We hypothesized that this genetic affinity could be a common genetic component that had spread across Europe during the Neolithic, related to the spread of agriculture [Sikora M, et al. Plos Genetics].
Sardinian population studies, in particular in the Ogliastra district, conducted by the ProgeNIA project, also allowed us to identify new genes that regulate height (important for growth disorders), haemoglobin levels (important for anaemia) and lipid and inflammatory molecule levels (predisposing factors for cardiovascular disease). The results have been published in three studies in Nature Genetics. In particular, one study [Zoledziewska et al. Nature Genetics] identified two genetic variants able to reduce human height by about 4 and 2 cm, respectively, compared to changes of about 0.3 cm typically produced by each of the approximately 700 previously discovered variants. By evaluating all these genetic variants, we were able to identify in Sardinians a systematic and non-random enrichment of height-reducing variants. This suggests a selective advantage for short stature in Sardinians, and is the first example in the human species of the already known ‘island effect’ whereby mammals tend to become smaller after spending hundreds of generations in an island environment. A second study [Danjou et al. Nature Genetics] investigated the genetic regulation of haemoglobin synthesis. The joint study of the three different forms of haemoglobin found in human blood suggests that their regulation is genetically coordinated and increases the treatment targets for increasing the production of specific forms of haemoglobin (HbF and HbA2) in hereditary anaemias such as beta-thalassaemia and sickle-cell anaemia. A third study [Sidore et al. Nature Genetics] discovered two new genes associated with circulating lipid levels and five new genes associated with inflammatory biomarkers, which have important clinical applications as predictors of cardiovascular and inflammatory disease risk. These results led Nature Genetics to dedicate the cover of its November 2015 issue to Sardinia. This was accompanied by two editorials ‘A new genomic island’ and ‘Small island, big genetic discoveries’, the first of which concluded that ‘Sardinia is now a prominent island on the global genomic map’.
DNA, RNA and proteins are the fundamental molecules of all known life forms. DNA contains the information that guides all cell processes. RNA, which is copied (‘transcribed’) from DNA controls protein production and regulates a number of biological processes. While the DNA of all the cells in an organism is always identical, RNA can vary in quantity and quality in different types of cells. The great plasticity of RNA determines the development of different cells, organs and tissues from the same genetic information in DNA. In parallel with DNA sequencing at whole-genome level, we also completed RNA sequencing at whole-transcriptome level [Pala et al. Nature Genetics]. In this study, we were able to correlate RNA from nucleated red blood cells with DNA. This has allowed us to identify thousands of genetic variants able to influence the quantity and sequence of certain RNAs and to provide important information on the mechanisms of action of genetic variants able to influence the risk of diseases or other health-related variables.
Our studies have also highlighted that genetics plays a fundamental role in regulating the circulating levels of the different immune system cells, by identifying numerous sites of the human genome involved in such genetic regulation in a paper published in Cell [Orrù et al. Cell]. The study analysed the role of genes in regulating the levels of hundreds of cell types, through a genome-wide association study conducted on thousands of individuals from four small towns in Sardinia included in the ProgeNIA/SardiNIA project, which studies the genetic basis of hundreds of biomedically-relevant parameters. In some cases, the genetic variants involved in the quantitative regulation of immune system cells also play a role in determining the risk of autoimmune diseases such as type-1 diabetes, multiple sclerosis, coeliac disease, ulcerative colitis and rheumatoid arthritis [Orrù et al. Cell].
Another interesting example of the results of the work done in those years is a study that allowed us to identify a particular form of the TNFSF13B gene associated with an increased risk of two autoimmune diseases, multiple sclerosis and systemic lupus erythematosus. In particular, we observed that the newly identified form of the gene increases the blood concentration of the Baff cytokine, a gene protein product which in turn increases the number of circulating B lymphocytes, confirming these cells’ long underestimated role in multiple sclerosis. These results suggest inhibition of the Baff cytokine as a new therapeutic opportunity for the treatment of this disease. This work has been published in the New England Journal of Medicine [Steri et al. NEJM], the world’s leading medical journal, and has been discussed in editorials in that and other journals.
I have selected eight significant studies, highlighting CRS4 co-authors
- Low-pass DNA sequencing of 1200 Sardinians reconstructs European Y-chromosome phylogeny.
Francalacci P, Morelli L, Angius A, Berutti R, Reinier F, Atzeni R, Pilu R, Busonero F, Maschio A, Zara I, Sanna D, Useli A, Urru MF, Marcelli M, Cusano R, Oppo M, Zoledziewska M, Pitzalis M, Deidda F, Porcu E, Poddie F, Kang HM, Lyons R, Tarrier B, Gresham JB, Li B, Tofanelli S, Alonso S, Dei M, Lai S, Mulas A, Whalen MB, Uzzau S, Jones C, Schlessinger D, Abecasis GR, Sanna S, Sidore C, Cucca F. Science. 2013 Aug 2;341(6145):565-9. doi: 10.1126/science.1237947.
- Population genomic analysis of ancient and modern genomes yields new insights into the genetic ancestry of the Tyrolean Iceman and the genetic structure of Europe.
Sikora M, Carpenter ML, Moreno-Estrada A, Henn BM, Underhill PA, Sánchez-Quinto F, Zara I, Pitzalis M, Sidore C, Busonero F, Maschio A, Angius A, Jones C, Mendoza-Revilla J, Nekhrizov G, Dimitrova D, Theodossiev N, Harkins TT, Keller A, Maixner F, Zink A, Abecasis G, Sanna S, Cucca F, Bustamante CD. PLoS Genet. 2014 May 8;10(5):e1004353. doi: 10.1371/journal.pgen.1004353. eCollection 2014 May.
- Genetic variants regulating immune cell levels in health and disease.
Orrù V, Steri M, Sole G, Sidore C, Virdis F, Dei M, Lai S, Zoledziewska M, Busonero F, Mulas A, Floris M, Mentzen WI, Urru SA, Olla S, Marongiu M, Piras MG, Lobina M, Maschio A, Pitzalis M, Urru MF, Marcelli M, Cusano R, Deidda F, Serra V, Oppo M, Pilu R, Reinier F, Berutti R, Pireddu L, Zara I, Porcu E, Kwong A, Brennan C, Tarrier B, Lyons R, Kang HM, Uzzau S, Atzeni R, Valentini M, Firinu D, Leoni L, Rotta G, Naitza S, Angius A, Congia M, Whalen MB, Jones CM, Schlessinger D, Abecasis GR, Fiorillo E, Sanna S, Cucca F. Cell. 2013 Sep 26;155(1):242-56. doi: 10.1016/j.cell.2013.08.041.
- Genome sequencing elucidates Sardinian genetic architecture and augments association analyses for lipid and blood inflammatory markers.
Sidore C, Busonero F, Maschio A, Porcu E, Naitza S, Zoledziewska M, Mulas A, Pistis G, Steri M, Danjou F, Kwong A, Ortega Del Vecchyo VD, Chiang CWK, Bragg-Gresham J, Pitzalis M, Nagaraja R, Tarrier B, Brennan C, Uzzau S, Fuchsberger C, Atzeni R, Reinier F, Berutti R, Huang J, Timpson NJ, Toniolo D, Gasparini P, Malerba G, Dedoussis G, Zeggini E, Soranzo N, Jones C, Lyons R, Angius A, Kang HM, Novembre J, Sanna S, Schlessinger D, Cucca F, Abecasis GR. Nat Genet. 2015 Nov;47(11):1272-1281. doi: 10.1038/ng.3368. Epub 2015 Sep 14. PMID: 26366554 Free PMC article.
- Height-reducing variants and selection for short stature in Sardinia.
Zoledziewska M, Sidore C, Chiang CWK, Sanna S, Mulas A, Steri M, Busonero F, Marcus JH, Marongiu M, Maschio A, Ortega Del Vecchyo D, Floris M, Meloni A, Delitala A, Concas MP, Murgia F, Biino G, Vaccargiu S, Nagaraja R, Lohmueller KE; UK10K consortium, Timpson NJ, Soranzo N, Tachmazidou I, Dedoussis G, Zeggini E; Understanding Society Scientific Group, Uzzau S, Jones C, Lyons R, Angius A, Abecasis GR, Novembre J, Schlessinger D, Cucca F. Nat Genet. 2015 Nov;47(11):1352-1356. doi: 10.1038/ng.3403. Epub 2015 Sep 14. PMID: 26366551 Free PMC article.
- Genome-wide association analyses based on whole-genome sequencing in Sardinia provide insights into regulation of hemoglobin levels.
Danjou F, Zoledziewska M, Sidore C, Steri M, Busonero F, Maschio A, Mulas A, Perseu L, Barella S, Porcu E, Pistis G, Pitzalis M, Pala M, Menzel S, Metrustry S, Spector TD, Leoni L, Angius A, Uda M, Moi P, Thein SL, Galanello R, Abecasis GR, Schlessinger D, Sanna S, Cucca F. Nat Genet. 2015 Nov;47(11):1264-71. doi: 10.1038/ng.3307. Epub 2015 Sep 14.PMID: 26366554 Free PMC article.
- Population- and individual-specific regulatory variation in Sardinia.
Pala M, Zappala Z, Marongiu M, Li X, Davis JR, Cusano R, Crobu F, Kukurba KR, Gloudemans MJ, Reinier F, Berutti R, Piras MG, Mulas A, Zoledziewska M, Marongiu M, Sorokin EP, Hess GT, Smith KS, Busonero F, Maschio A, Steri M, Sidore C, Sanna S, Fiorillo E, Bassik MC, Sawcer SJ, Battle A, Novembre J, Jones C, Angius A, Abecasis GR, Schlessinger D, Cucca F, Montgomery SB. Nat Genet. 2017 May;49(5):700-707. doi: 10.1038/ng.3840. Epub 2017 Apr 10. PMID: 28394350 Free PMC article.
- Overexpression of the Cytokine BAFF and Autoimmunity Risk.
Steri M, Orrù V, Idda ML, Pitzalis M, Pala M, Zara I, Sidore C, Faà V, Floris M, Deiana M, Asunis I, Porcu E, Mulas A, Piras MG, Lobina M, Lai S, Marongiu M, Serra V, Marongiu M, Sole G, Busonero F, Maschio A, Cusano R, Cuccuru G, Deidda F, Poddie F, Farina G, Dei M, Virdis F, Olla S, Satta MA, Pani M, Delitala A, Cocco E, Frau J, Coghe G, Lorefice L, Fenu G, Ferrigno P, Ban M, Barizzone N, Leone M, Guerini FR, Piga M, Firinu D, Kockum I, Lima Bomfim I, Olsson T, Alfredsson L, Suarez A, Carreira PE, Castillo-Palma MJ, Marcus JH, Congia M, Angius A, Melis M, Gonzalez A, Alarcón Riquelme ME, da Silva BM, Marchini M, Danieli MG, Del Giacco S, Mathieu A, Pani A, Montgomery SB, Rosati G, Hillert J, Sawcer S, D'Alfonso S, Todd JA, Novembre J, Abecasis GR, Whalen MB, Marrosu MG, Meloni A, Sanna S, Gorospe M, Schlessinger D, Fiorillo E, Zoledziewska M, Cucca F. N Engl J Med. 2017 Apr 27;376(17):1615-1626. doi: 10.1056/NEJMoa161052
Editorials referring to the studies
Referring to Francalacci et al.:
Cann RL. Y weigh in again on modern humans. Science. 2013 Aug.
Referring to Sidore et al, Zoledziewska et al. and Danjou et al.:
A new genomic island. Nature Genetics. 2015 Nov; 2015 47(11):1221 and
Lettre and Hirschhorn, Small island, big genetic discoveries, Nature Genetics, 2015 Nov; 2015 47(11):1124-1125
Referring to Steri et al. NEJM
Korn T, Oukka A BAFFling Association between Malaria Resistance and the Risk of Multiple Sclerosis. N Engl J Med. New England Journal of Medicine. 2017 Apr, 376 (17), 1680-1681 2017
+ Stohl W., BAFF emerges from the genetic shadows Nature Reviews Rheumatoogy. 2017 Aug, 13 (8), 456-457
+ Comabella M., B cells and variant BAFF in autoimmune disease Nature Reviews Neurology 2017 Aug, 13 (8), 453-454.