From raw data to actionable clinical insights: High-throughput analysis with Microsoft, and Databricks Unified Analytics Platform for Genomics (UAP4Genomics)
The first human genome took 13 years and over $2.5 billion to sequence. Today, a human genome can be sequenced in a couple days for less than the price of the latest iPhone. As a result, pharmaceutical and healthcare organizations are profiling the genomes of millions of patients, generating thousands of petabytes of data. Genomic sequencing is expected to generate up to 40 exabytes of data per year by 2025. This will dwarf the amount of data generated by YouTube and Twitter combined.
When paired with clinical data, genomic data offers huge potential to accelerate drug discovery, predict disease risks, and personalize treatments, enabling healthcare providers to significantly improve patient outcomes.
The end-to-end analysis of large scale genomics data still remains complex and expensive. In this workshop, we will walkthrough how Microsoft and the Databricks Unified Analytics Platform for Genomics (UAP4Genomics) simplifies the end-to-end process of turning raw sequencing data into actionable insights at population scale. We will demonstrate how to call variants in a single sample using our accelerated GATK4 pipeline, before using tools like Hail to characterize the association of variants in a population with clinical phenotypes. Once we identify variants that associate with phenotypes that we are interested in, we will then use machine learning to model genome-wide disease risk in a reproducible manner.
Key technologies employed: GATK4/Variant calling, Hail/Genotype-phenotype association tests, population scale risk-modeling via ML, ML model training/deploymen
Amir Kermany is a Genomics/HLS Solutions Architect at Databricks, where he applies his more than 15 years of academic and industry experience to solve today’s challenges in analysis and productization of the vast amounts of genomics and clinical data. Dr Kermany’s past positions include Sr. Data Scientist at Shopify, Senior Staff Scientist at AncestryDNA, Postdoctoral Scholar at the Howard Hughes Medical Institute (University of Chicago) and the University of Montreal. He holds a PhD in Mathematics, MSc in Electrical Engineering and BSc. in Physics.