小程序
传感搜
传感圈

How to Address Machine Learning Model Drift

2022-12-24
关注

How to Address Machine Learning Model Drift
Illustration: © IoT For All

Most people know about artificial intelligence (AI), but fewer are well-versed in the concept of machine learning (ML). There’s a lot to know about this high-tech process, and it seems there are always new things to learn about it; for example, machine learning model drift.

One drawback of operating an ML model is that it needs to be retrained as time passes. The accuracy of an ML model’s predictions decreases as business outcomes, the economy, and customer expectations change, a concept called “model drift.” 

When does ML model drift occur, and how can practitioners address it?

'AI and ML are becoming increasingly popular technologies in today’s digitally driven world. Some of the largest corporations leverage ML to deliver products and services.' -Zac AmosClick To Tweet

What Is Machine Learning Model Drift?

AI and ML are becoming increasingly popular technologies in today’s digitally driven world. Some of the largest corporations leverage ML to deliver products and services. Take Netflix, for example. The streaming service uses ML models for several reasons, such as forming recommendations or learning which characteristics make content successful.

Businesses are investing in AI solutions, consumers are paying for ML-curated content and engineers are finding new applications across industries. The most essential component of any AI or ML solution is structured and unstructured data. It’s complex and subject to change over time, and information used for ML model training is no exception. 

ML models suffer from model drift when they provide less accurate predictions. Model drift or decay can render the model unstable, making its predictions increasingly erroneous.

A core principle of ML is that high-quality data produces accurate predictions. However, what the original model was trained to achieve may become irrelevant or outdated. ML engineers and specialists must go through the process of retraining and redeploying the model, making sure to use the latest training data available. If not, the model will continue to make predictions with low accuracy.

There are two types of model drift: concept and data.

Concept Drift

Concept drift occurs when a model’s target or statistical properties change. During a model’s training period, it learns a function that maps the target variable. However, time goes on, and eventually, the model unlearns those patterns or cannot use them in a new environment. This type of drift can occur seasonally, gradually, or suddenly, making it challenging to anticipate when it will happen.

Data Drift

Data drift — or covariate drift — occurs when ML training information changes. All input changes to a model will impact the final predictions. The distribution of its variables will be different, so users need to be aware of this discrepancy. 

How to Address Model Drift

ML experts often use drift detection tools, which automate model monitoring. However, there are other ways data scientists and ML experts can handle cases of drift.

Here are the steps one would need to take to address model drift. 

Analyze the Drift

It’s vital to plot the distributions of drifted features with the ultimate goal of determining what has changed to cause the drift. Does it match the baseline of the static ML model? Surprisingly, some drifts are less meaningful than others, so experts must analyze them carefully and decide if it’s worth addressing.

Check Data Quality

Organizations that detect drift should first check the model’s input data. Something changed, but what? Is the model still relevant to the goals of the project? Data quality should always be the first suspect regarding cases of drift.

Users can choose to address the drift or do nothing. Receiving an alert might be a false alarm, or perhaps people are satisfied with how the drift impacted predictions. However, sometimes change is necessary.

Retrain the Model

Since data distributions shift over time, it’s critical to retrain the model after drift is detected. Deploying an ML model is not a one-and-done project but a continuous one. 

The main reason why it’s crucial to retrain a model with drift is that it keeps it on top of emerging trends between input and output data. Check the model every few weeks or months throughout the year to ensure it’s working with the latest training information.

Monitor for Issues

Once the model learns from the new training data, keep an eye out to see how the drift was affected. Periodic updates are wise, and checking the model post-retraining will help data scientists and other professionals see if the drift still occurs.

If drift is detected, follow the steps outlined above. Drift detection tools are worthwhile investments, as they remove the extra responsibility and time needed to make corrections.

Beware of Drift in ML Projects

Drift is something every data scientist, researcher, and engineer should be aware of, especially in today’s competitive business sector. One of ML’s most notable features is the ability to use historical data to predict future outcomes. 

Outcomes become inaccurate when drift occurs. Any business decisions made following this information could damage the organization. Beware of concept and data drift, as it greatly affects the model’s performance.

Tweet

Share

Share

Email

  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Data Analytics

  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Data Analytics

参考译文
如何解决机器学习模型漂移问题
大多数人都知道人工智能(AI),但很少有人精通机器学习(ML)的概念。关于这一高科技工艺,有很多东西需要了解,而且似乎总是有新的东西需要了解;例如,机器学习模型漂移。操作ML模型的一个缺点是,随着时间的推移,它需要重新训练。ML模型预测的准确性会随着业务结果、经济和客户期望的变化而降低,这一概念称为“模型漂移”。ML模型漂移何时发生,从业者如何解决它?AI和ML在当今数字驱动的世界中越来越受欢迎。一些大公司利用机器学习来交付产品和服务。以Netflix为例。流媒体服务使用ML模型有几个原因,比如形成推荐或学习哪些特征使内容成功。企业正在投资于人工智能解决方案,消费者正在为ml策划的内容付费,工程师们正在各行业寻找新的应用程序。任何人工智能或机器学习解决方案最重要的组成部分都是结构化和非结构化数据。它很复杂,而且随着时间的推移会发生变化,用于ML模型训练的信息也不例外。当ML模型提供不准确的预测时,会受到模型漂移的影响。模型漂移或衰减会使模型不稳定,使其预测越来越错误。ML的核心原则是高质量的数据产生准确的预测。然而,训练原始模型的目的可能会变得无关紧要或过时。机器学习工程师和专家必须经历重新培训和重新部署模型的过程,确保使用可用的最新培训数据。否则,该模型将继续做出低准确度的预测。有两种类型的模型漂移:概念和数据。当模型的目标或统计属性发生变化时,就会发生概念漂移。在模型的训练期间,它学习一个映射目标变量的函数。然而,随着时间的推移,最终,模型会忘记这些模式,或者不能在新的环境中使用它们。这种类型的漂移可以季节性地、逐渐地或突然地发生,因此很难预测它何时会发生。当ML训练信息发生变化时,就会发生数据漂移或协变量漂移。对模型的所有输入更改都会影响最终的预测。其变量的分布将是不同的,因此用户需要意识到这种差异。ML专家经常使用漂移检测工具,这些工具可以自动监控模型。然而,数据科学家和ML专家还有其他方法可以处理漂移的情况。下面是处理模型漂移需要采取的步骤。绘制漂移特征的分布是至关重要的,最终目标是确定是什么变化导致了漂移。它是否匹配静态ML模型的基线?令人惊讶的是,有些漂移没有其他漂移那么有意义,所以专家们必须仔细分析它们,并决定是否值得解决。检测漂移的组织应该首先检查模型的输入数据。有些东西变了,但是什么?模型是否仍然与项目的目标相关?对于漂移的情况,数据质量应该始终是首要考虑因素。用户可以选择解决漂移问题,也可以什么都不做。收到警报可能是假警报,也可能人们对漂移影响预测的方式感到满意。然而,有时改变是必要的。由于数据分布随着时间的推移而变化,在检测到漂移后重新训练模型是至关重要的。部署ML模型不是一个一劳永逸的项目,而是一个持续的项目。用漂移重新训练模型至关重要的主要原因是,它可以使模型保持在输入和输出数据之间的新趋势之上。全年每隔几周或几个月检查一次模型,以确保它与最新的培训信息一起工作。 一旦模型从新的训练数据中学习,请密切关注漂移是如何受到影响的。定期更新是明智的,在再训练后检查模型将帮助数据科学家和其他专业人员查看漂移是否仍然存在。如果检测到漂移,请遵循上面概述的步骤。漂移检测工具是值得投资的,因为它们消除了进行校正所需的额外责任和时间。漂移是每个数据科学家、研究人员和工程师都应该意识到的问题,尤其是在当今竞争激烈的商业领域。ML最显著的特点之一是能够使用历史数据来预测未来的结果。当漂移发生时,结果变得不准确。根据这些信息做出的任何业务决策都可能损害组织。注意概念和数据漂移,因为它会极大地影响模型的性能。
您觉得本篇内容如何
评分

评论

您需要登录才可以回复|注册

提交评论

iotforall

这家伙很懒,什么描述也没留下

关注

点击进入下一篇

2023世界机器人大会暨博览会WRC

提取码
复制提取码
点击跳转至百度网盘