New applications for artificial intelligence technologies are rapidly emerging, with healthcare often touted as the industry to be most disrupted by AI.
Scientists are now developing machine learning technologies to transform precision medicine, genetic testing, diagnostics and the development of therapies. At Deep Genomics
, they're building systems that predict the functional consequences of genetic variation, using machine learning on models that link DNA sequences to downstream molecular phenotypes which are used to identify disease variants.
However, where in image classification there are neatly formatted datasets with image-label pairs, training molecular phenotype models requires aggregating complex data sources and computationally intensive data processing. It can take months to go from raw data to a trained model, so Deep Genomics designed an internal framework making use of cloud computing and TensorFlow GPU computing to address the delay, improving their throughput 100 fold.
Alice Gao, Research Scientist & Engineer at Deep Genomics
, will be speaking at the Women in Machine Intelligence & Healthcare Dinner
, on 12 October in London, on 'Developing a Scalable Workflow for Training Molecular Phenotype Models'. I spoke to her ahead of the event to learn more about her work, and the application of machine intelligence in healthcare.
What are you currently working on at Deep Genomics?
Deep Genomics is addressing the “genotype-phenotype gap". We are working to connect individual genetic variations to phenotypes which impact health. This must be done in a way that is reliable, scalable and trustworthy. Our team has diverse skill sets, and we collaborate on multiple concurrent projects. Personally, my recent focus is engineering an internal framework to more rapidly train and improve our molecular phenotype models.
What do you feel are the leading factors enabling recent advancements and uptake of machine learning and deep learning?
Massive datasets, the use of GPUs, and cloud computing enables us to train deep neural networks with hyperparameter tuning on millions of data points within days. Furthermore, researchers invented regularization techniques such as dropout to overcome the problem of overfitting. This enables neural networks to generalize well even while they become deeper and more complex.
What are the current challenges of using machine learning to identify disease variants?
We need models to identify and interpret rare, causal variants, in both coding and noncoding regions of the genome. Each individual is different, so those variants need to be interpreted within one’s genomic context. Predicting disease phenotype directly from genetic variants ignores the underlying biological mechanism, thus the resulting model won’t generalize well to other disease types, or even a variation of the same disease.
At Deep Genomics, we build models for the intermediate stage we refer to as ‘molecular phenotypes’ to bridge the gap between genotype and phenotype. A variant, or combination of variants, and ultimately one’s genotype can be evaluated for its effect on molecular phenotypes, which can be linked to the disease phenotype.
What developments can we expect to see in machine intelligence in the next 5 years?
Current machine learning research workflow will be accelerated significantly with improved software frameworks, through the availability of larger datasets, and more cloud based computation power. Steps in the workflow may be automated and replaced by AI, enabling human experts to spend more time advancing core theory and algorithms. A new application in applied machine learning requires defining the problem, generating data, and combining the trained model with domain knowledge for interpretation. Most of these steps are application-specific and not well defined. Correct and efficient execution within a short research turnaround time will become crucial.
What is the next step for machine learning and genome biology?
Genome biology is so complex that human experts are far from understanding the complete picture. Large amounts of data generated from high throughput experiments, will accelerate this problem. We won’t be able to deal with bias and noise, or make sense of the data without the help of machine learning. On the other hand, trying to learn a complete model by enveloping data is extremely hard, if at all possible. The system must be categorized into a hierarchy of processes, where each one can be uniquely represented. Starting from a well studied process, hand wired logic can be constructed and will improve as more data becomes available.
Alternatively, processes where our knowledge is limited but large amounts of data exists can be replaced by a machine learning model. Predictions from models will also support creating new hypothesis to be experimentally validated. Lastly, results can be leveraged to iterate and improve on the models.
I see a new generation of genome biology where data and domain knowledge are combined under the umbrella of machine learning, for deeper mechanistic understanding and improved diagnostics.
Alice Gao will be speaking at the Women in Machine Intelligence & Healthcare Dinner in London on 12 October, an evening of discussions and networking focused on the progress and application of machine intelligence in the healthcare sector, and celebrating the women advancing the field. Other speakers include Razia Ahamed, Google DeepMind, and Kathy McGroddy Goetz, IBM Watson Health. To book your tickets, please visit the event site here.Over the course of the dinner, hear from leading female experts in Machine Intelligence and discuss the impact of AI sectors including machine learning, deep learning and robotics in healthcare. Attendees will establish new connections and network with peers including Founders, CTOs, Data Scientists and Medical Practitioners.Check out our Women in Tech & Science series for more Q&As. See the full events list here for events taking place in London, Amsterdam, Boston, San Francisco, New York, Hong Kong and Singapore.