Neural Networks as a Basis for Genetics-Based Predictions
Traditionally, estimation of underlying genetic propensity for certain traits or diseases has been based on the use of single genetic variants identified to have a large effect on the outcome of interest. However, in the past 10 years, it has become increasingly clear that most complex traits are driven by a large number of genetic variations with small effects. As a result, polygenic scores (PGS) have become the focus of efforts to estimate genetic propensity. PGS are typically defined as a weighted linear combination of many small effect genetic variations. Thus, PGS do not take into account higher order interactions between variations, non-linear effects or interactions with environment. Practically, addressing these issues using standard statistical modeling techniques can be cumbersome. Neural networks provide an alternative and flexible paradigm by which to evaluate the efficacy of non-linear variations on PGS.
In this talk, I will first introduce the basic problem of identifying genetic variants associated with human traits and disease and show how we currently define PGS using data collected from research participants at 23andMe. Next, I will show how the state of the art process for defining PGS is amenable to a translation to deep neural networks and propose a network architecture that can be used for multiple outcome genetics based predictors. I will present preliminary results for predictions of weight and height that incorporate genetics, population ancestry and environmental and lifestyle factors such as exercise frequency and eating behavior and show how these neural network predictions stack up against regression and machine learning models. Finally, I will conclude with a brief overview of some of the open problems motivated by issues such as missing and longitudinal data
Nick Furlotte is a Senior Scientist at 23andMe, where he works with a small team focused on developing the next generation of the company’s consumer health platform. This R+D team develops predictive models for human traits and diseases that incorporate genetics, lifestyle and environment. His work is a mix of classical statistical genetics, epidemiology, machine learning and product development. Nick earned his PhD in Computer Science from UCLA and holds an MS in Bioinformatics and BS in Computer Science both from The University of Memphis.