小程序
传感搜
传感圈

How to Achieve Data Quality in the Cloud

2022-08-28
关注

You've finally moved to the Cloud. Congratulations! But now that your data is in the Cloud, can you trust it? With more and more applications moving to the cloud, the quality of information is becoming a growing concern. Erroneous data can cause many business problems, including decreased efficiency, lost revenue and even compliance issues. This blog post will discuss the causes of poor data quality and what companies can do to improve it.

Ensuring data quality has always been a challenge for most enterprises. This problem increases when dealing with data in the cloud or sharing data with different external organizations because of technical and architectural challenges. Cloud data sharing has become increasingly popular recently as businesses seek to utilize the cloud's scalability and cost-effectiveness. However, the return on investment from these data analytics projects can be questionable without a strategy to ensure data quality.

Related: Why Bad Data Could Cost Entrepreneurs Millions

What contributes to data quality issues in the Cloud?

Four primary factors contribute to data quality issues in the cloud:

  • When you migrate your system to the cloud, the legacy data may not be good quality. As a result, insufficient data gets carried forward into a new system.
  • Data may become corrupted during migration, or cloud systems may not be configured correctly. For example, a Fortune 500 company restricted its cloud data warehouses to store numbers up to eight decimal points. This challenge caused truncation errors during migration resulting in a $50 million reporting issue.
  • Data quality can be a problem when data from different sources must be combined. For example, two different departments of a pharmaceutical company use different units (number versus packs) to store inventory information. When this information was incorporated into the cloud data warehouse, it became a nightmare to report and analyze the data because of the inconsistencies in the unit.
  • Data from External Data vendors can have questionable quality.

Related: Your Data Might Be Safe in the Cloud But What Happens When It Leaves the Cloud?

Why is validating data quality in the cloud difficult?

Everybody knows data quality is essential. Most companies spend significant money and resources trying to improve data quality. However, despite these investments, companies lose money yearly because of insufficient data, ranging from $9.7 million to $14.2 million annually.

Traditional data quality programs do not work well for identifying data errors in cloud environments because:

  • Most organizations only look at the data risks they know, which is likely only the tip of an iceberg. Usually, data quality programs focus on completeness, integrity, duplicates and range checks. However, these checks only represent 30 to 40 percent of all data risks. Many data quality teams do not check for data drift, anomalies or inconsistencies across sources, contributing to over 50 percent of data risks.
  • The number of data sources, processes and applications has exploded because of the rapid adoption of cloud technology, big data applications and analytics. These data assets and processes require careful data quality control to prevent errors in downstream processes.
  • The data engineering team can add hundreds of new data assets to the system in a short period. However, the data quality team usually takes about one to two weeks to check for each new data asset. This means that the data quality team has to prioritize which assets need reviews first, and as a result, many assets don't get checked.
  • Organizational bureaucracy and red tape can often slow down data quality programs. Data is a corporate asset, so any change requires multiple approvals from different stakeholders. This can mean that data quality teams must go through a lengthy process of change requests, impact analysis, testing and signoffs before implementing a data quality rule. This process can take weeks or even months when the data may have significantly changed.

What can you do to improve the quality of cloud data?

It is essential to use a strategy that considers these factors to ensure data quality in the Cloud. Below are some tips for achieving data quality in the cloud:

  • Check the quality of your legacy and third-party data. Fix any errors you find before migrating to the cloud. These quality checks will increase the cost and time it takes to complete the project but having a thriving data environment in the cloud will be worth it.
  • Reconcile the cloud data with the legacy data to ensure data was not lost or changed during the migration.
  • Establish governance and control over your cloud data and process. Monitor data quality on an ongoing basis and establish corrective actions when errors are found. This will help prevent issues from getting out of hand and becoming too costly to fix.

In addition to the traditional data quality process, data quality teams must analyze and establish predictive data checks, including data drift, anomaly, data inconsistency across sources, etc. One way to achieve this is by using machine learning techniques to identify hard-to-detect data errors and augment current data quality practices. Another strategy is to adopt a more agile approach to data quality and align with the Data Operations teams to accelerate the deployment of data quality checks in the cloud.

Migrating to the cloud is complex, and data quality should be top of mind to ensure a successful transition. Adopting a strategy for achieving data quality in the cloud is essential for any business that relies on data. By considering the factors contributing to data quality issues and putting processes and tools in place, you can ensure that the highest-quality data and your cloud data projects will have a greater chance of success.

Related: Streamline Your Data Management, Web Services, Cloud, and More by Learning Amazon Web Services

参考译文
如何在云计算中实现数据质量
You' we终于移动到云。恭喜你!但是现在你的数据在云端,你能相信它吗?随着越来越多的应用程序迁移到云端,信息的质量越来越受到关注。错误的数据可能会导致许多业务问题,包括效率下降、收入损失甚至合规性问题。这篇博客文章将讨论数据质量差的原因,以及公司可以做些什么来改善它。对于大多数企业来说,确保数据质量一直是一个挑战。当处理云中的数据或与不同的外部组织共享数据时,由于技术和架构上的挑战,这个问题会增加。最近,随着企业寻求利用云的可伸缩性和成本效益,云数据共享变得越来越受欢迎。然而,如果没有确保数据质量的策略,这些数据分析项目的投资回报可能会令人怀疑。相关:为什么糟糕的数据会让企业家损失百万? 4个主要因素导致了云中的数据质量问题:相关:你的数据在云中可能是安全的,但当它离开云后会发生什么?为什么在云中验证数据质量如此困难?每个人都知道数据质量是必不可少的。大多数公司都在努力提高数据质量方面投入了大量资金和资源。然而,尽管有这些投资,由于数据不足,公司每年都在亏损,每年970万到1420万美元不等。传统的数据质量程序在云环境中不能很好地识别数据错误,因为:必须使用考虑这些因素的策略来确保云中的数据质量。以下是在云中实现数据质量的一些技巧:除了传统的数据质量过程外,数据质量团队必须分析并建立预测数据检查,包括数据漂移、异常、跨数据源的数据不一致等。实现这一目标的一种方法是使用机器学习技术来识别难以检测的数据错误,并增强当前的数据质量实践。另一种策略是采用更敏捷的方法来提高数据质量,并与数据操作团队合作,以加速在云中部署数据质量检查。迁移到云是复杂的,为了确保成功的迁移,应该首先考虑数据质量。对于任何依赖数据的业务来说,采用在云中实现数据质量的策略都是必不可少的。通过考虑导致数据质量问题的因素,并将流程和工具放在适当的位置,可以确保最高质量的数据和您的云数据项目将有更大的成功机会。相关:精简您的数据管理,网络服务,云,和更多通过学习亚马逊网络服务
您觉得本篇内容如何
评分

评论

您需要登录才可以回复|注册

提交评论

entrepreneur

这家伙很懒,什么描述也没留下

关注

点击进入下一篇

精选推荐 | 西克首届云展会重点展品

提取码
复制提取码
点击跳转至百度网盘