小程序
传感搜
传感圈

Key Features of a Comprehensive MLOps Platform

2022-08-10
关注

Illustration: © IoT For All

Artificial Intelligence (AI) and Machine Learning (ML) present a boon for businesses as technologies with the potential to help organizations make better predictions, create innovative services for customers, and deliver faster business outcomes. Finance teams, operations, customer success, and marketing departments stand to benefit. That said, the overarching reason that organizations are facing challenges and delays in bringing ML models into production stems from the fact that models are different from traditional software, and most organizations don’t yet have frameworks and processes for dealing with these differences. Let’s take a look at how a MLOps platform can help with efficiency and collaboration.

'Any MLOps platform should take a human-centered approach - meaning it is designed to provide users with the critical information they need' -Navin BudhirajaClick To Tweet

What is MLOps?

MLOps is a practice for a subset of the ML model life cycle to help teams deploy, manage, and maintain machine learning models and drive consistency and efficiency across an organization. Similar to DevOps—a set of practices that integrate software development with IT operations—MLOps adds automation to streamline the orchestration of steps in the workflow that begins once a model is ready to go into production.

A machine learning model’s life cycle spans many steps, and typically they are all managed by different people across discrete systems that need to be connected. These systems are used for data collection, data processing, feature engineering, data labeling, model building, training, optimizing, deploying, risk monitoring, and retraining. And in each organization, different people and teams may own one or more steps.

In an ideal environment, ML models are solving company problems and driving better decision analyses. However, only a fraction of these models enters production. Even then, it typically takes months for a successful model to become active, according to Gartner. That’s because the process of deploying a machine learning model into production is often disjointed. Siloed teams of data engineers, data scientists, IT ops professionals, auditors, business domain experts, and ML engineering teams operate in a patchwork arrangement that bogs down the process.

Downfalls of MLOps

Part of the problem is that MLOps is still an emerging discipline, and different people perform the tasks that span MLOps in each organization. In some organizations, data scientists are involved in nearly every step of a model’s life cycle; in others, there may be discrete teams for each phase or teams that own one or more areas. For organizations to realize the full value of ML, models need to be put into production quickly and at scale. There needs to be a guide for how to handle MLOps that makes sense for an enterprise’s goals and the structure of its team. Because of this, MLOps platforms are taking on an increasingly critical function in expediting the ML efforts of organizations.

Platforms have the potential to deliver a blueprinted strategy to create repeatable and streamlined processes, regardless of whether the industry is manufacturing or financial services—or any other industry. End-to-end platforms can save an innumerable amount of time because of how many models can be deployed and monitored simultaneously while operating at the speed businesses need. The best MLOps platforms provide solutions for all ML stakeholders so they can not only deploy and manage models at scale but also foster efficiency through collaboration and communication across the different people using the platform at different stages. Let’s take a look at the four main aspects of a successful MLOps platform.

A Successful MLOps Platform

#1: A Collaborative Experience for all Stakeholders

Given that the key stakeholders—from data teams to engineers to risk auditors—tend to function in silos in many organizations, simplifying the process to enable any user to perform a specific role leads to better outcomes for efficiently performing tasks. Platforms that enable collaboration across an organization provide teams the ability to quickly operationalize models, regardless of the tools data scientists used to create those models. There is no longer a need to restrict any other user, such as machine learning engineers or IT teams. Each platform user should be able to use the tools they already have and leverage their expertise with those tools. Having a single, collaborative interface that can intuitively guide a user through the steps that abstract the complexity of the process is a beneficial component of MLOps.

#2: A User-First, Modular Architecture

Given that many organizations may be handling MLOps differently, platforms that meet them where they are offer immediate value. A platform with a modular architecture provides organizations the necessary flexibility to get up and run quickly by enabling each person to use the platform functionality that they need when they need it rather than forcing them to operate in a linear fashion.

For example, an organization may have data scientists with a preferred set of tools but who lack the ability to easily deploy or monitor models in production. An MLOps platform designed with openness and users in mind will offer easy plug-and-play components so each user can make decisions on the best cloud, database, repositories, and other components to use without having to make sweeping changes. Every company will implement the process of operationalizing models a bit differently, and modular architecture enables MLOps teams to leverage their entire suite of tools and seamlessly bring specific components of the platform into their ML workflows.

#3: An Emphasis on Optimization

As models become larger and increasingly more complex, one of the challenges organizations often run into is a dramatic increase in hardware or computing needs. Machine learning is, by definition, data-intensive and will cost organizations a lot of money without careful consideration of the infrastructure in place. Models that take a long time to head to production coupled with rising costs for hardware are a recipe for tension with executives and leadership weighing ROI in an organization.

MLOps platforms that can optimize models and present model performance and cost-saving data in a format that helps users make decisions based on the factors most important to them can alleviate some of the challenges organizations face as they ramp up ML modeling and production. As more companies deploy more ML models to more devices that can be on any cloud, edge devices, or on-prem, the ability to optimize models will become increasingly important.

#4: An Ability to Continuously Monitor Models in Production

It’s important for an MLOps platform to accelerate the process of bringing models to production. But once there, the real work begins, and platforms need to enable teams to continuously monitor risks, such as model performance and unstructured data, and quickly take action to mitigate operational and reputational risk.

ML models are not static. They are trained and tested in environments that are controlled, but when models are deployed into production they are making predictions based on real-world data which can be quite different for a variety of reasons. For example, a model’s performance or accuracy in predictions can change. Models also experience different types of drift, such as data drift, for when there is a significant change in buying patterns. This happened during COVID, for example, and led to former distribution patterns no longer being accurate.

Simplifying the Process

To help teams continuously monitor models in production, MLOps platforms should simplify the ability to:

1. Set alerts based on custom thresholds.
2. Provide quick at-a-glance access to key data points showing which models are failing.
3. Rapidly identify the root cause and take action.

Leveraging an integrated platform allows for the creation of a customized risk monitoring plan before and after deployment. A comprehensive approach to mitigating risk includes evaluating uncertainties within the data to guide AI/ML teams along the right path.

Platforms Must Put Humans First

We are still in the early stages of figuring out how to best use ML in enterprises. Any MLOps platform should take a human-centered approach—meaning it is designed to provide users with the critical information they need, an intuitive way to complete the tasks they need to complete, and the ability to collaborate and communicate with other stakeholders and colleagues. Platforms that put human workers first help build trust between people and ML. This garnered level of trust between person and machine helps ease the mind of the worker and allows the technology to perform many of the statistical tasks to aid these workers. The intentional design of such platforms will continue to focus on augmenting and amplifying human intelligence and delivering new opportunities to promote collaboration with advancing AI and ML initiatives.

Tweet

Share

Share

Email

  • Artificial Intelligence
  • Automation
  • Data Analytics
  • Machine Learning

  • Artificial Intelligence
  • Automation
  • Data Analytics
  • Data Scientist
  • Machine Learning

参考译文
综合MLOps平台的主要特点
人工智能(AI)和机器学习(ML)为企业带来了福音,因为这些技术有可能帮助企业做出更好的预测,为客户创造创新服务,并更快地交付业务结果。财务团队、运营、客户成功和营销部门都将从中受益。也就是说,组织在将ML模型引入生产中面临挑战和延迟的主要原因是模型不同于传统软件的事实,而且大多数组织还没有处理这些差异的框架和过程。让我们看看MLOps平台如何帮助提高效率和协作。MLOps是ML模型生命周期的一个子集,用于帮助团队部署、管理和维护机器学习模型,并推动整个组织的一致性和效率。与devops(一组将软件开发与IT运营集成在一起的实践)类似,mlops添加了自动化来简化工作流中的步骤编排,一旦模型准备投入生产,工作流就开始了。一个机器学习模型的生命周期跨越许多步骤,通常它们都由不同的人在需要连接的离散系统上进行管理。这些系统用于数据收集、数据处理、特征工程、数据标注、模型构建、培训、优化、部署、风险监测和再培训。在每个组织中,不同的人和团队可能拥有一个或多个步骤。在一个理想的环境中,ML模型是解决公司问题和驱动更好的决策分析。然而,只有一小部分车型进入生产。Gartner表示,即便如此,一个成功的模式通常也需要几个月的时间才能活跃起来。这是因为将机器学习模型部署到产品中的过程往往是脱节的。由数据工程师、数据科学家、IT运维专业人员、审计师、业务领域专家和ML工程团队组成的竖井团队以一种拼凑的方式运作,这使流程陷入停滞。部分问题在于,MLOps仍然是一种新兴的学科,在每个组织中由不同的人执行跨MLOps的任务。在一些组织中,数据科学家几乎参与了模型生命周期的每一步;在其他情况下,每个阶段可能有离散的团队,或者拥有一个或多个区域的团队。对于组织来说,要实现ML的全部价值,模型需要快速且大规模地投入生产。需要有一个关于如何处理对企业目标和团队结构有意义的MLOps的指南。正因为如此,MLOps平台在加速组织的ML努力方面承担着越来越重要的功能。无论该行业是制造业还是金融服务业,或者任何其他行业,平台都有潜力交付一个蓝图战略,以创建可重复的和精简的流程。端到端平台可以节省无数的时间,因为可以同时部署和监控许多模型,同时以业务所需的速度运行。最好的MLOps平台为所有的ML利益相关者提供解决方案,这样他们不仅可以大规模部署和管理模型,还可以通过在不同阶段使用平台的不同人员之间的协作和沟通来提高效率。让我们来看看一个成功的MLOps平台的四个主要方面。 考虑到关键利益相关者——从数据团队到工程师再到风险审计员——在许多组织中往往是各自为营,因此简化流程,让任何用户都能扮演特定的角色,从而能够更好地高效地执行任务。支持跨组织协作的平台为团队提供了快速操作模型的能力,无论数据科学家使用什么工具创建这些模型。不再需要限制任何其他用户,如机器学习工程师或IT团队。每个平台用户都应该能够使用他们已经拥有的工具,并利用他们在这些工具上的专业知识。拥有一个单一的协作界面,可以直观地指导用户完成抽象流程复杂性的步骤,这是MLOps的一个有益组件。考虑到许多组织可能会以不同的方式处理MLOps,满足它们的平台可以提供直接的价值。具有模块化架构的平台为组织提供了必要的灵活性,使每个人能够在需要时使用他们需要的平台功能,而不是强迫他们以线性方式操作。例如,一个组织可能拥有数据科学家,他们拥有首选的工具集,但缺乏在生产中轻松部署或监控模型的能力。设计时考虑到开放性和用户的MLOps平台将提供简单的即插即用组件,这样每个用户都可以决定使用最好的云、数据库、存储库和其他组件,而无需进行彻底的更改。每个公司实施模型操作的过程都略有不同,模块化架构使MLOps团队能够利用他们的整个工具套件,并无缝地将平台的特定组件引入ML工作流。随着模型变得越来越大,越来越复杂,组织经常遇到的挑战之一是硬件或计算需求的急剧增加。从定义上讲,机器学习是数据密集型的,在没有仔细考虑基础设施的情况下,机器学习将花费组织大量的资金。需要很长时间才能投入生产的模型,加上硬件成本的上升,会导致企业高管和领导层在权衡投资回报率时产生紧张情绪。MLOps平台可以优化模型,并以一种帮助用户基于对他们最重要的因素做出决策的格式呈现模型性能和节省成本的数据,可以缓解组织在提高ML建模和生产时面临的一些挑战。随着越来越多的公司将更多的ML模型部署到更多的设备上,这些设备可以位于任何云、边缘设备或预部署设备上,优化模型的能力将变得越来越重要。对于MLOps平台来说,加速将模型投入生产的过程非常重要。但一旦有了这些,真正的工作就开始了,平台需要让团队能够持续监控风险,比如模型性能和非结构化数据,并迅速采取行动降低运营和声誉风险。ML模型不是静态的。他们在受控的环境中进行训练和测试,但是当模型部署到生产中时,他们根据现实数据进行预测,而现实数据由于各种原因可能会有很大的不同。例如,模型的性能或预测的准确性可以改变。当购买模式发生重大变化时,模型也会经历不同类型的漂移,比如数据漂移。例如,这发生在COVID期间,并导致以前的分布模式不再准确。为了帮助团队在生产中持续监控模型,MLOps平台应该简化以下功能:2.设置告警阈值。提供对关键数据点的快速一瞥访问,显示哪些模型正在失效。迅速找出根本原因并采取措施。 利用一个集成的平台,可以在部署之前和之后创建定制的风险监控计划。降低风险的综合方法包括评估数据中的不确定性,以指导AI/ML团队沿着正确的道路前进。关于如何在企业中最好地使用ML,我们仍处于早期阶段。任何MLOps平台都应该采用以人为中心的方法——这意味着它旨在为用户提供他们需要的关键信息、一种完成他们需要完成的任务的直观方式,以及与其他利益相关者和同事协作和沟通的能力。将人类工人放在首位的平台有助于建立人和机器之间的信任,这种在人和机器之间积累的信任有助于缓解工人的思维,并允许技术执行许多统计任务来帮助这些工人。此类平台的有意设计将继续专注于增强和放大人类智能,并提供新的机会来促进与推进AI和ML倡议的合作。
您觉得本篇内容如何
评分

评论

您需要登录才可以回复|注册

提交评论

iotforall

这家伙很懒,什么描述也没留下

关注

点击进入下一篇

李飞飞:从保洁员到AI科学家

提取码
复制提取码
点击跳转至百度网盘