小程序
传感搜
传感圈

The Importance of Data Quality for Successful AI/ML Modelling

2022-11-22
关注

The Importance of Data Quality for Successful AI/ML Modelling
Illustration: © IoT For All

Artificial Intelligence (AI) and Machine Learning (ML) technologies have the potential to drastically revolutionize many industries. But AI and ML have an Achilles heel that few people talk about. A study conducted by Refiniv in 2019 Smarter Humans, Smarter Machines: Artificial Intelligence / Machine Learning Global Study, revealed that the biggest barrier to the deployment and adoption of artificial intelligence and machine learning is bad data quality. Data from alternative resources and unstructured data is becoming increasingly important but must be “refined” before their insights become truly valuable for use. 

The saying “garbage in, garbage out” applies to AI/ML deployment–if you give the models bad data, the analysis and results will be sub-par, too. According to the Refiniv Survey, 66 percent of respondents said that poor data quality affects their ability to deploy machine learning and artificial intelligence technologies. The report also suggests that three of the four challenges of working with new data in ML models are related to data quality. These challenges include accurate information about the history, coverage, and population of data, identifying incomplete or corrupt records, and cleaning and managing data. One of the biggest challenges data scientists face is finding good quality data, as they have to spend 80-90 percent of their time cleaning and normalizing bad data.

'Data quality is extremely important when performing data analysis, regardless of whether it is to be used for artificial intelligence or not.' -Amy Groden-MorrisonClick To Tweet

Why Is Data Quality Important?

Data quality is extremely important when performing data analysis, regardless of whether it is to be used for artificial intelligence or not. Data quality has two components:

  1. Missing data
  2. Incorrect data

Both issues are highly problematic, and the impact of each issue can only be determined on a case-to-case basis. If data quality is not sold in ML models, it leads to misunderstanding and wrong inferences. Research has demonstrated that companies analyze market data and unstructured data along with their own company data. This means that they are combining the three different data sources to gain insights. Traditionally, structured data has been the key to strong quantitative analysis. However, unstructured data is the main challenge for companies. Data from alternative sources is mostly unstructured and needs to be refined and validated for accuracy.  

Machine learning approaches like natural language processing (NLP) are used to structure and refine text-based data. Facebook and Google have been focusing a great deal on unstructured data. Their success is making unstructured data easier, more accurate, and more effective. And even though ML has made extracting information from unstructured data sources easier, it is still a time-taking process, and it requires a lot of skill and patience to train ML models.

Mobile Apps: Missing Link to AI Interpretation

The best way to ensure that data is of good quality is to get it from a reliable source that’s easy to access. When it comes to trusted sources, using mobile apps can be one way. Mobile apps give you more control over data quality than traditional paper forms that many organizations still use, and you can easily access digital data whenever you need it. 

Mobile apps are key to artificial intelligence implementation as they can improve data quality. Traditional data comes from paper-based processes, which are often prone to manual errors. If the data quality is bad, your artificial intelligence will suffer too, not to mention lost information or time delays you will face with paper forms. Replacing these processes with mobile app-based digital forms will eliminate errors and improve data quality. Mobile apps can automatically capture information like time, location, and data and even validate calculations, digital signatures, barcodes, and readings. In particular, mobile apps that collect field data are critical to successful AI implementation when field data is used as a key data source for the model.

The Real Costs of Bad Data

We may not realize it, but bad data can cost a lot of money (as much as $10 per record). A report from the data quality company, “The Real Costs of Bad Data,” notes that up to 20 percent of the information gathered by staff is incorrect. The report suggests that verifying information can cost up to one dollar per record. This money goes towards paying employees, the cost of running computers, and using a validation solution. 

However, the one dollar per record may seem misleading as the costs go significantly up if batch processing is used for validation. Then the costs will rise to $10 per record, and even that figure is underestimated if the company doesn’t have mechanisms in place to check records. It may amount to $100 per record due to returned mail, misplaced shipments, and lost marketing opportunities. This means you will lose revenue and have to spend enormous amounts of money on the shipping process. Simply put, bad data not only costs money to refine and repair but also causes a loss in revenue because of the company’s inability to deliver to customers and reach potential ones. 

The best way to minimize bad data is going paperless and digitizing all processes. You can save a lot of money by going paperless, improving productivity, and reducing the hidden costs of dealing with bad data. Building powerful apps will help your company save time and reduce costs. Paper-based processes take a lot of time and labor to manage when everything can be digitized with minimal human intervention. 

Mobile App Builder

To make mobile apps that can facilitate your business processes, you will need the right app builder to build mobile forms for any mobile device and go paperless. For this, low-code development platforms can be ideal as they allow citizen developers to build enterprise apps. Many low-code development platforms can develop mobile-based forms in minutes with the latest mobile app features (like GPS, camera, etc.) to capture data accurately and quickly. 

Tweet

Share

Share

Email

  • Artificial Intelligence
  • Data Analytics
  • Machine Learning

Email

  • Artificial Intelligence
  • Data Analytics
  • Machine Learning

参考译文
数据质量对于成功的AI/ML建模的重要性
人工智能(AI)和机器学习(ML)技术有可能彻底改变许多行业。但是AI和ML有一个很少有人谈论的致命弱点。Refiniv在2019年《更聪明的人类,更聪明的机器:人工智能/机器学习全球研究》中进行的一项研究显示,部署和采用人工智能和机器学习的最大障碍是糟糕的数据质量。来自替代资源和非结构化数据的数据正变得越来越重要,但在它们的见解变得真正有价值之前,必须“细化”它们。“垃圾输入,垃圾输出”这句话适用于AI/ML部署——如果您提供的模型是错误的数据,那么分析和结果也会低于标准。根据Refiniv调查,66%的受访者表示,糟糕的数据质量影响了他们部署机器学习和人工智能技术的能力。该报告还指出,在ML模型中处理新数据的四个挑战中,有三个与数据质量有关。这些挑战包括关于数据的历史、覆盖范围和数量的准确信息,识别不完整或损坏的记录,以及清理和管理数据。数据科学家面临的最大挑战之一是找到高质量的数据,因为他们必须花费80% - 90%的时间清理和规范化错误数据。在执行数据分析时,无论是否用于人工智能,数据质量都是极其重要的。数据质量有两个组成部分:这两个问题都很成问题,每个问题的影响只能根据具体情况来确定。如果在ML模型中不出售数据质量,就会导致误解和错误的推断。研究表明,公司会将市场数据和非结构化数据与自己的公司数据一起分析。这意味着他们正在结合三个不同的数据源来获得洞察。传统上,结构化数据是强有力的定量分析的关键。然而,非结构化数据是企业面临的主要挑战。来自其他来源的数据大多是非结构化的,需要对其进行细化和验证以保证准确性。像自然语言处理(NLP)这样的机器学习方法被用于结构化和精炼基于文本的数据。Facebook和谷歌一直非常关注非结构化数据。它们的成功在于使非结构化数据更容易、更准确、更有效。尽管ML使得从非结构化数据源中提取信息变得更加容易,但这仍然是一个耗时的过程,并且需要大量的技能和耐心来训练ML模型。确保数据质量的最佳方法是从容易访问的可靠来源获取数据。说到可信任的信息源,使用移动应用程序可能是一种方法。与许多机构仍在使用的传统纸质表单相比,移动应用程序让你可以更好地控制数据质量,而且你可以在需要的时候轻松访问数字数据。移动应用程序是人工智能实现的关键,因为它们可以提高数据质量。传统的数据来自于基于纸张的过程,这往往容易出现人工错误。如果数据质量不好,你的人工智能也会受到影响,更不要说纸质表单会导致信息丢失或时间延迟。用基于移动应用程序的数字表单取代这些流程将消除错误并提高数据质量。移动应用程序可以自动捕获时间、位置和数据等信息,甚至可以验证计算、数字签名、条形码和读数。特别是,当现场数据被用作模型的关键数据源时,收集现场数据的移动应用程序对成功实现AI至关重要。 我们可能没有意识到,但错误的数据可能会花费很多钱(每条记录可高达10美元)。数据质量公司的一份报告《不良数据的真实成本》(the Real Costs of Bad data)指出,员工收集的信息中有高达20%是不正确的。该报告指出,核实信息的成本可能高达每条记录1美元。这笔钱将用于支付员工工资、运行计算机的成本以及使用验证解决方案。然而,每条记录1美元的价格似乎具有误导性,因为如果使用批处理进行验证,成本将显著上升。然后,每条记录的成本将上升到10美元,如果公司没有适当的机制来检查记录,即使这个数字也被低估了。由于邮件被退回,货物被放错位置,以及失去营销机会,每条记录可能达到100美元。这意味着你将失去收益,并不得不在运输过程中花费大量资金。简单地说,错误的数据不仅需要花费大量的资金来改进和修复,而且还会导致公司的收入损失,因为公司无法向客户交付产品,也无法接触到潜在的客户。将不良数据降到最低的最好方法是将所有流程都无纸化和数字化。通过实现无纸化、提高工作效率和减少处理不良数据的隐性成本,您可以节省大量资金。开发功能强大的应用程序将帮助您的公司节省时间和降低成本。基于纸张的流程需要大量的时间和人力来管理,而所有的东西都可以通过最少的人工干预进行数字化。要制作能够促进业务流程的移动应用程序,您需要合适的应用程序构建器为任何移动设备构建移动表单并实现无纸化。为此,低代码开发平台是理想的,因为它们允许公民开发人员构建企业应用程序。许多低代码开发平台可以在几分钟内开发基于手机的表单,并使用最新的手机应用功能(如GPS、摄像头等)来准确、快速地捕获数据。
您觉得本篇内容如何
评分

评论

您需要登录才可以回复|注册

提交评论

提取码
复制提取码
点击跳转至百度网盘