小程序
传感搜
传感圈

Detect & Overcome Model Drift in MLOps

2022-08-25
关注

Illustration: © IoT For All

Machine learning (ML) is widely considered the cornerstone of digital transformation and is also the most vulnerable to the changing dynamics in the digital landscape. ML models can be optimized and defined by the parameters and variables available during the time they were created. Model drift (or model decay), is the loss of predictive power in an ML model. Model drift can be caused by changes in the digital environment and subsequent changes in variables, such as data. It is simply due to the nature of the machine language model as a whole.

It is easy to model drift into MLOps if you assume that all future variables will be the same as they were when the ML model was first created. If a model is run in static conditions using static data, then it shouldn’t suffer from poor performance since the data used for training is the same. If the model is in a dynamic and changing environment with too many variables, it will likely have a different performance.

'MLOps have made it easier to retrain models more frequently and at shorter intervals.' -SigmoidClick To Tweet

Types of Model Drift

Based on the changes in variables or predictors, model drift can be divided into two types:

  1. Concept drift: Concept drift is when the statistical attributes of target variables change in a model. Simply stated, concept drift is when the variables in a model change. The model will not function properly.
  2. Data drift: This is the most common form of model drift. It occurs when certain predictors’ statistical properties change. The model can fail if the variables change. The model might be successful in one environment but not in another. This is because the data is not tailored to changing variables.

Model drift is a result of the battle between data drift and concept drift. Upstream data changes play an important role in model drift. The data pipeline can lead to missing data and features not being generated.

Let’s take a look at an ML model that was created to track spam emails. It is based on a generic template of spam emails at the time. The ML model can identify and stop these types of emails, thereby preventing potential phishing attacks. As the threat landscape evolves and cybercriminals become more sophisticated, real-life emails have replaced the original. ML detection systems that rely on previous years’ variables will not be able to properly classify new hacking threats when confronted with them. This is only one example of model drift.

Addressing Model Drift

Model drift can be detected early to ensure accuracy. Because model accuracy drops with time and predicted values deviate more from actual values, this is why it is important to catch drift early. This process can cause irreparable damage to the entire model. It is important to catch the problem early. It is easy to spot if something is wrong with your F1 score. This is where the precision and recall abilities of the model are evaluated.

A variety of metrics, depending on the situation, will also be relevant depending on the purpose of the model. An ML model that is intended for medical use will have a different set if it is compared to one designed for business operations. The result is exactly the same: If a specific metric falls below a threshold, there is a high likelihood that model drift is occurring.

In many cases, however, measuring the accuracy of a model can be difficult, especially when it is hard to get the actual and predicted data. This is one of the challenges in scaling ML models. Refining models using experience can help you to predict when drift may occur in your model. To address any impending model drift, models can be redeveloped regularly.

It is possible to keep the original model intact and create new models that improve or correct the predictions from the baseline model. It is important to weigh data based on current changes, especially when data changes over time. The ML model will be more robust if it gives more weight to the most recent changes than to the older ones. This will allow the database to handle future drift-related changes.

MLOps for Sustainable ML Models

There are no one-size fits all solution to ensure model drift is identified and dealt with promptly. It doesn’t matter if you are doing scheduled model retraining or using real-time machine learning, building a machine learning model that is sustainable is not an easy task.

MLOps have made it easier to retrain models more frequently and at shorter intervals. Data teams can automate model retraining. Scheduling is the best way to start the process. MLOps companies can strengthen their existing data pipeline by automating retraining. It doesn’t require any code changes or rebuilding of the pipeline. However, if a company discovers a new feature, or algorithm, including it in the retrained model will significantly improve model accuracy.

Monitor Your Models!

There are many variables that you need to take into consideration when deciding how often models should be retrained. Sometimes, waiting for the problem is the best option. This is especially true if there are no previous records to guide you. Models should also be retrained according to patterns that are tied to seasonal variations in variables. Monitoring is crucial in this sea change. No matter what business domains you work in, constant monitoring at regular intervals will always be the best way for model drift detection.

Tweet

Share

Share

Email

  • Artificial Intelligence
  • Data Analytics
  • Internet of Things
  • Machine Learning

  • Artificial Intelligence
  • Data Analytics
  • Internet of Things
  • Machine Learning

参考译文
检测和克服MLOps中的模型漂移
机器学习(ML)被广泛认为是数字转型的基石,也最容易受到数字领域动态变化的影响。ML模型可以通过创建时可用的参数和变量进行优化和定义。模型漂移(或模型衰减),是ML模型中预测能力的损失。模型漂移可能由数字环境的变化和随后变量(如数据)的变化引起。这只是由于机器语言模型作为一个整体的性质。如果假设未来的所有变量都与ML模型最初创建时相同,那么很容易将模型漂移引入MLOps。如果一个模型使用静态数据在静态条件下运行,那么它应该不会受到不良性能的影响,因为用于训练的数据是相同的。如果模型是在一个动态变化的环境中,有太多的变量,它可能会有不同的性能。根据变量或预测因子的变化,模型漂移可以分为两种类型:模型漂移是数据漂移和概念漂移之争的结果。上游数据变化在模型漂移中起着重要作用。数据管道可能会导致丢失数据和功能无法生成。让我们看一下为跟踪垃圾邮件而创建的ML模型。它是基于当时的垃圾邮件的一般模板。ML模型可以识别并阻止这些类型的电子邮件,从而防止潜在的钓鱼攻击。随着威胁格局的演变和网络犯罪变得更加复杂,现实生活中的电子邮件已经取代了原来的电子邮件。当面临新的黑客威胁时,依赖于前几年变量的ML检测系统将无法正确地分类它们。这只是模型漂移的一个例子。模型漂移可以早期检测,以确保准确性。由于模型精度随着时间的推移而下降,预测值与实际值的偏差也更大,这就是为什么尽早发现漂移非常重要。这个过程可能会对整个模型造成不可修复的损害。尽早发现问题是很重要的。如果F1分数有问题,很容易就能发现。这是评估模型的精度和查全能力的地方。不同的度量标准,根据不同的情况,也会根据模型的目的而相关。如果将用于医疗用途的ML模型与用于商业操作的模型进行比较,那么该模型将具有不同的集合。结果是完全相同的:如果一个特定的度量低于一个阈值,就很有可能发生模型漂移。然而,在许多情况下,测量模型的准确性可能是困难的,特别是当很难获得实际和预测的数据时。这是扩展ML模型的挑战之一。利用经验改进模型可以帮助您预测模型何时可能发生漂移。为了解决任何即将发生的模型漂移,可以定期重新开发模型。可以保持原始模型的完整性,并创建新的模型来改进或修正基线模型的预测。根据当前的变化来权衡数据是很重要的,特别是当数据随着时间变化时。如果ML模型赋予最近的变化比旧的更大的权重,那么它将更加健壮。这将允许数据库处理未来与漂移相关的更改。没有一个万能的解决方案来确保模型漂移被识别和及时处理。无论你是在进行计划的模型再培训还是使用实时机器学习,建立一个可持续的机器学习模型都不是一件容易的事情。 MLOps使得以更短的间隔更频繁地重新训练模型变得更容易。数据团队可以将模型再培训自动化。制定计划是开始这个过程的最好方法。MLOps公司可以通过自动化再培训来加强现有的数据管道。它不需要任何代码更改或重新构建管道。然而,如果一家公司发现了一个新的特征或算法,将其包含在再训练的模型中,将显著提高模型的准确性。在决定模型重新训练的频率时,需要考虑许多变量。有时候,等待问题的出现是最好的选择。如果没有以前的记录来指导您,这尤其正确。模型还应该根据与变量的季节性变化相关的模式进行重新训练。监测在这一巨变中至关重要。无论您在哪个业务领域工作,定期进行持续监视始终是模型漂移检测的最佳方法。
您觉得本篇内容如何
评分

评论

您需要登录才可以回复|注册

提交评论

iotforall

这家伙很懒,什么描述也没留下

关注

点击进入下一篇

Artfi融资326万美元,估值1亿美元

提取码
复制提取码
点击跳转至百度网盘