What are the leading industry standards for Machine Learning Operations (MLOps)? It’s a question deserving of an answer, but which neither the literature nor the industry have yet supplied. The following set of short guides provides good practices and principles for standardizing and streamlining machine learning (ML) life cycles.
In this article, we’ll begin defining basic terminology and outlining the high-level picture of (1) what qualifies as a good MLOps framework and (2) what it should be able to perform.
Where Does MLOps Fit In?
The last decade has been something of a golden age for Big Data and Artificial Intelligence (AI). Today’s reality is a world of where bank fraud could be foreseen and prevented, where retail prices could be dynamically optimized for the maximum benefit of the consumer, or where image classification could further the robotics industry. Data science and artificial intelligence are at the epicenter of progress when it comes to performing successful prediction tasks.
Companies are hiring data scientists in droves to develop specialized models for their businesses. However, in their enthusiasm, they often fall prey to the same mistake. If a business focuses too much on developing ML models and too little on actually deploying or maintaining them, they may not achieve their desired results. Here is where Machine Learning Operations (MLOps) fits in. A lot of data projects that lose track would have had much better chances of success if ML-Ops practices were applied.
Data scientists dive into solving the minutiae of isolated business processes that contribute to the overall success of the business. Their work focuses on (1) managing data and (2) creating machine learning (ML) models. In terms of managing data, they are responsible for curating available and relevant statistical information. It’s essential for even starting modelling. Next to it, training machine learning models requires a lot of experimentation, analysis, hypothesis testing.
But how do we move from the exploration stage to a fully-functioning solution that is stable enough to be deployed in production? How do we make it extendable and easy to support? What should we do after we deploy our solution?
Contrary to managing data and creating ML models, all of these questions require a normative, problem-solving perspective, and focus on automation. In the following sections, we will see how MLOPs approach these challenges.
What are MLOps and MLOps Framework?
MLOps is a relatively new term that refers to a practice that aims to design production frameworks that can make developing and maintaining machine learning (ML) models seamless and efficient.
After deploying your ML model, a good MLOps framework should be able to constantly monitor how your model and system or application performs, instantly detect (and auto-correct) model–or system–specific issues, seamlessly integrate new features, and periodically version your artifacts (such as model configurations).
How to monitor MLOps?
Monitoring is embedded in all engineering practices. Machine learning should be no different. MLOps best practices encourage making expected behavior visible and setting standards that models should adhere to, rather than rely on a ’gut feeling’.
That being said, it’s essential to track model and app performance metrics, specifically:
- Technical metrics
Technical metrics refer to the boundaries of an ML model, such as the model’s latency (the time taken to predict one data unit) or data throughput (the number of data units processed in one unit of time).
- Performance metrics
Performance metrics (or evaluation metrics) for an ML model are measurable aspects, such as the model’s accuracy, that create a reference for future progress and improvement.
- Functional metrics
Functional metrics are application or infrastructure metrics, such as infrastructure health, application connections, and others affecting operational tasks.
How to fix MLOps?
Detection engines are invaluable when it comes to building a secure and sustainable data infrastructure. The aim of a detection engine is to architect a fully automatic, self-healing system. Although it may have diminishing returns based on one’s desired use-case, creating an automated engine that can detect a problem and act upon it is critical. An MLOps engineer defines the scope of ”what-to-fix”.
However, you probably don’t want to give your 'solution' too much power to change components. There are, nevertheless, some good applications you might want to use an automated detection engine.
A well-designed Automated Model Retraining feature can be among the most harmless inclusions in your solution, and can serve you in the following ways:
Retraining the ability to predict
Over time, machine learning models start to lose their predictive power for natural reasons. Data pattern changes or transitive global dynamics can make what was once an ideal model suddenly imperfect. When a model becomes out of sync with present demands, we call this a model drift. In a perfect world, your algorithmic solution should detect when this happens to modify the ML model associated with it so that it matches its intended purpose again. In other words, the drift detection catalyzes retraining.
Scaling of infrastructure
After an ML-Ops is launched, its ML solution might require increasing resources such as storage capacity or computational power. As data workflow and storage demands change, the size and power of the system should adjust with it. Another name for this is ”scalability.” A solid framework should predict and alert a resource outburst to scale the infrastructure respectively.
How to integrate new features in MLOps?
Continuous Integration and Continuous Delivery (CI/CD) are among the core pillars of MLOps development. You should be able to implement new features and upgrades into your solution that will check if those modifications are compatible with the running model to avoid its disruption.
For this, we use the following components:
- Testing: It involves integrating new functionalities to our working production model on a constant basis. It’s absolutely critical to avoid functionality issues that could disrupt–and jeopardize– our model. The main goal is to test how a new feature gets along with your model or application before we actually deploy it to production.
- Code quality: Code quality should be assessed every time the code is modified. There should be an embedded mechanism checking the code quality and code coverage before it is pushed onward to the production servers.
How to version artifacts in MLOps?
Though version control for code is standard procedure in software development, machine learning solutions require a more involved process. This entails versioning the following artifacts:
- Code versioning: This is the best practice for periodically saving and labeling your code to make the development process reliable and reproducible.
- Data versioning: This is important for storing and labeling data which our model uses. Training the same algorithm with the same configuration on a different data will result in a different model. Hence it’s crucial for reproducibility.
- Model versioning: This happens when you store configuration files and model artifacts. It should be done every time the model is modified. Besides the reproducibility purposes, model versions are often used as reference for debugging or when business needs to know which model made a certain prediction and why.