Developing a Scalable Workflow for Training Molecular Phenotype Models at Deep Genomics
How can people make sense of their genomes? At Deep Genomics, we build systems that predict the functional consequences of genetic variation. We use machine learning to build models that link DNA sequences to downstream molecular phenotypes like mRNA structure, abundance, and stability, which are used to identify disease variants. Unlike image classification, where there are neatly formatted datasets with image-label pairs, training molecular phenotype models requires aggregating complex data sources and computationally intensive data processing. It can take months to go from raw data to a trained model. To address this delay, we designed an internal framework that abstracts the overall workflow into modular components and processes that make use of cloud computing and TensorFlow GPU computing. This has improved throughput 100 fold, and has made the design, training, and evaluation of models a continuous process.
Alice is an engineer, machine learning researcher, and software developer. She holds BASc and MASc degrees from the University of Toronto, and started her PhD in the machine learning group on an NSERC Scholarship. She works on machine learning systems that predict the effects of protein coding mutations, and has expertise working with NGS bioinformatics pipelines.