小程序
传感搜
传感圈

FAIR Data Principles & Digital Twins

2022-07-30
关注




Illustration: © IoT For All






In plays, films, books, and music, there is often a key moment where everything in the story comes together. Software and data engineers know these moments, when after days of work you get everything together in your code and data so you can finally write and run the ‘if’ statement. A version of the “if” statement might be, if river.level > X and rainfall.forecast > Y, then…. When it comes to rainfall and rivers, the ‘then’ part could involve millions of pounds of damage, weeks of transport disruption, and possible loss of life. We will use this flooding algorithm to explore how fair data principles and digital twins could interact and cooperate.




The “If” Statement

The ‘if’ statement is our first kind of data interaction. A computer algorithm brings two pieces of data together so they can be compared, and some insight can be gained. But what those pieces of data are and how they get to the “if” statement is more complex than you might think.

There’s no search engine for data​: not publicly and rarely within enterprises. There are attempts at searchability such as data.gov.uk, but they are intended for people, not algorithms. It is said that data scientists spend at least 50 percent of their time looking for data​ rather than looking at data. This epic waste of time is because data is hidden, deliberately or unintentionally, in silos, in datasets, behind APIs, or in program-unfriendly formats, such as PDF. This is not findable by machines, but what if it was?

Access & Interoperability

When it comes to access and interoperability, the two are linked. A computer may be able to find some data but may not be able to understand it. It would help interoperability if the date could have some metadata to indicate that the river level was measured in meters and the rainfall in millimeters, for example. We now have the find, access, and interoperate, and the data interaction in the “if” statement is re-using that data for our new purpose.

This is the basis of the FAIR data principles, conceived by a consortium of leading scientists and organizations to ensure that scientific data sets could be found and used by machines, with minimal human intervention. FAIR stands for Findable, Accessible, Interoperable, and Reusable – and it’s going mainstream.

The FAIR World

In a FAIR world, computers can find and understand data, but we still can’t program them with that “if” statement when the data is in large datasets. In our flooding scenario, what our algorithm also needs is the river level at a specific location and the rainfall forecast at a different location, probably well upstream from the place where the flood is likely to occur. So, even if our algorithm can find the right dataset, it still needs to know how to run a query against the dataset to find the data it wants.

There is an element of granularity of the data that is important – and that’s where digital twins come in. Digital twins are a virtualization of an asset’s data. The asset itself is a useful level of granularity here. Our algorithm needs to choose the appropriate rainfall forecasts and required river levels. Metadata about the assets beyond their location might also be useful. Knowing who operated them would help our algorithm assign weight to the readings if some operators’ data proved more reliable and accurate than others. Having some provenance of the data as actually coming from that twin and the twin really being the one operated by the Environment Agency, for example, would build trust in the output of our algorithm. The exchange of metadata between twins to establish trust and access is our second data interaction.

Timeliness

The final step to get to the ‘if’ statement is about timeliness. Homeowners won’t appreciate being told on Wednesday that a flood would occur on Tuesday when their houses are already knee-deep in muddy water. The data needs to flow between the twins and the algorithm as close to real time as possible so that the predictions are available in a timely way. This is not just important in our flooding scenario; it’s important in business, where latency between something happening and the business reacting to it can cost millions.

We have reached a point where we have an algorithm running, exchanging data with digital twins. But what does the algorithm do in the ‘then’ part of the ‘if’ equation? What if it could share the data back with other digital twins, or create new twins of the likely flood locations and have them share into a growing ecosystem of cooperative twins?

Data & Twin Interactions

If the algorithm has its own digital twin, it simplifies the model where everything is a twin and creates symmetry. The twin of the algorithm interacts with the twins of the data sources. Data interactions are twin interactions and twin interactions are the exchange of data and metadata between twins. If fair data principles and twins could interact and cooperate, imagine what transformations could be achieved.


参考译文
公平数据原则与数字双胞胎
在戏剧、电影、书籍和音乐中,通常有一个关键的时刻,故事中的所有东西都汇集在一起。软件和数据工程师知道这样的时刻,在几天的工作之后,你将所有的东西整合到代码和数据中,这样你就可以最终编写和运行“if”语句。“if”语句的一个版本可能是,if河。水平比;X和降雨。预测比;Y,那么……当涉及到降雨和河流时,“随后”部分可能涉及数百万磅的损失,数周的运输中断,以及可能的生命损失。我们将使用这种洪水算法来探索公平数据原则和数字双胞胎如何互动和合作。“if”语句是我们的第一种数据交互。计算机算法将两份数据放在一起进行比较,从而获得一些见解。但这些数据是什么,它们是如何得出“如果”语句的,比你想象的要复杂得多。没有搜索引擎搜索数据:没有公开的,在企业内部也很少。有一些类似data.gov.uk这样的可搜索性的尝试,但它们针对的是人,而不是算法。据说,数据科学家至少有50%的时间花在寻找数据上,而不是看数据。这种巨大的时间浪费是因为数据被有意或无意地隐藏在竖井中、数据集中、api后面,或者以对程序不友好的格式(如PDF)隐藏。这是机器找不到的,但如果是呢?当涉及到访问和互操作性时,两者是联系在一起的。计算机可能能够找到一些数据,但可能无法理解这些数据。例如,如果数据能够包含一些元数据来表明河流水位的测量单位是米,降雨量的测量单位是毫米,这将有助于互操作性。我们现在有了查找、访问和互操作,并且“if”语句中的数据交互正在为我们的新目的重用该数据。这是公平数据原则的基础,该原则由一个由领先的科学家和组织组成的联盟设想,以确保科学数据集可以被机器发现和使用,而很少有人干预。FAIR代表可查找、可访问、可互操作和可重复使用——它正在成为主流。在公平的世界里,计算机可以找到并理解数据,但当数据是大型数据集时,我们仍然不能用“如果”语句为它们编程。在我们的洪水场景中,我们的算法还需要特定位置的河流水位和不同位置的降雨预报,可能是洪水可能发生的地方的上游。因此,即使我们的算法能够找到正确的数据集,它仍然需要知道如何对数据集运行查询来找到它想要的数据。数据的粒度是很重要的,这就是数字双胞胎的作用。数字双胞胎是资产数据的虚拟化。资产本身在这里是一个有用的粒度级别。我们的算法需要选择适当的降雨预报和所需的河流水位。资产位置之外的元数据也可能有用。如果某些操作人员的数据被证明比其他操作人员的数据更可靠、更准确,那么了解操作人员的身份将有助于我们的算法为读数赋予权重。有一些数据的来源实际上是来自那个双胞胎而那个双胞胎实际上是由环境署操作的,例如,会建立对我们算法输出的信任。双胞胎之间交换元数据以建立信任和访问是我们的第二次数据交互。 使用“如果”语句的最后一步是关于时间性。房主们不会喜欢在周三被告知周二会发生洪水,因为他们的房子已经被深及膝盖的浑水淹没了。数据在双胞胎和算法之间的流动需要尽可能接近实时,以便及时获得预测。这不仅在我们的洪水场景中很重要;这在商业中是很重要的,在某些事情发生和业务对其做出反应之间的延迟可能会花费数百万美元。我们已经有了一个运行的算法,和数字双胞胎交换数据。但是算法在“如果”方程的“然后”部分做了什么呢?如果它可以与其他数字双胞胎共享数据,或者在可能的洪水地点创建新的双胞胎,并让他们共享到一个不断增长的合作双胞胎生态系统中,会怎么样?如果算法有自己的数字双胞胎,它就简化了模型,所有东西都是双胞胎,并创造了对称性。算法的双胞胎与数据源的双胞胎相互作用。数据交互是双胞胎交互,双胞胎交互是双胞胎之间数据和元数据的交换。如果公平的数据原则和双胞胎能够相互作用和合作,想象一下可以实现什么样的转变。
您觉得本篇内容如何
评分

评论

您需要登录才可以回复|注册

提交评论

iotforall

这家伙很懒,什么描述也没留下

关注

点击进入下一篇

2022广东佛山国际工业互联网及工业通讯展览会

提取码
复制提取码
点击跳转至百度网盘