Machine Learning Meets DevOps
Today’s mandate for faster business innovation, faster response to changes in the market, and faster development of new products demand a new paradigm for software development. DevOps is a set of practices that aim to decrease the time between changing a system in Development, and transferring the change to the Operation environment, and exploiting the Operation data back in the Development. DevOps practices are typically relying on large amount of data coming from Operation. The amount of data depends on the architectural style, the underlying development technologies and deployment infrastructure. However, in order to make effective decisions in Development, e.g., architecture changes in continuous delivery pipelines, there has to be efficient processing of big data in Operation. In this situation where data streams are increasingly large-scale, dynamical and heterogeneous, mathematical and algorithmic creativity are required to bring statistical methodology to bear. Statistical machine learning can fill the gap between operation and development with more efficient analytical techniques. Such analytical techniques can provide more deep knowledge and can uncover the underlying patterns in the operational data, e.g., to detect anomalies in the operation or detect performance anti-patterns. This knowledge can be very practical if detected ontime to refactor the development artifacts including code, architecture and deployment.
In this talk, I will start motivating the necessity of data-driven analytics for generating feedback to Dev from Ops based on my previous experience with large-scale big data systems in industry. I will present our recent work on configuration tuning of big data software, where we primarily applied Bayesian Optimization and Gaussian Processes to effectively find optimum configurations. I will also talk about transfer learning to exploit complimentary and cheap information (e.g., past measurements in a continuous delivery pipeline regarding early versions of the system) to enable learning accurate models efficiently and with considerably less cost. Results show that despite the high cost of measurement on the real system, learning performance models can become surprisingly cheap as long as certain properties are reused across environments. In the second half of the talk, I will present empirical evidence, which lays a foundation for a theory explaining why and when transfer learning works by showing the similarities of performance behavior across environments. I will present observations of environmental changes‘ impacts (such as changes to hardware, workload, and software versions which are predominant in DevOps) for a selected set of configurable systems from different domains to identify the key elements that can be exploited for transfer learning. These observations demonstrate a promising path for building efficient, reliable, and dependable software systems.
Pooyan Jamshidi is an Assistant Professor at the University of South Carolina. Prior to his current position, he was a research associate at Carnegie Mellon University (2016-2018) and Imperial College London (2014-2016), where he primarility worked on transfer learning for performance analyses of highly-configurable systems including robotics and big data systems. He holds a Ph.D. from Dublin City University (2010-2014). Pooyan's general research interests are at the intersection of software engineering, systems, and machine learning, and his focus is primarily in the areas of distributed machine learning. Pooyan has spent 7 years in software industry before his PhD.