小程序
传感搜
传感圈

Artificial General Intelligence Is Not as Imminent as You Might Think

2022-07-15
关注

To the average person, it must seem as if the field of artificial intelligence is making immense progress. According to the press releases, and some of the more gushing media accounts, OpenAI’s DALL-E 2 can seemingly create spectacular images from any text; another OpenAI system called GPT-3 can talk about just about anything; and a system called Gato that was released in May by DeepMind, a division of Alphabet, seemingly worked well on every task the company could throw at it. One of DeepMind’s high-level executives even went so far as to brag that in the quest for artificial general intelligence (AGI), AI that has the flexibility and resourcefulness of human intelligence, “The Game is Over!” And Elon Musk said recently that he would be surprised if we didn’t have artificial general intelligence by 2029.

Don’t be fooled. Machines may someday be as smart as people, and perhaps even smarter, but the game is far from over. There is still an immense amount of work to be done in making machines that truly can comprehend and reason about the world around them. What we really need right now is less posturing and more basic research.

To be sure, there are indeed some ways in which AI truly is making progress—synthetic images look more and more realistic, and speech recognition can often work in noisy environments—but we are still light-years away from general purpose, human-level AI that can understand the true meanings of articles and videos, or deal with unexpected obstacles and interruptions. We are still stuck on precisely the same challenges that academic scientists (including myself) having been pointing out for years: getting AI to be reliable and getting it to cope with unusual circumstances.

Take the recently celebrated Gato, an alleged jack of all trades, and how it captioned an image of a pitcher hurling a baseball. The system returned three different answers: “A baseball player pitching a ball on top of a baseball field,” “A man throwing a baseball at a pitcher on a baseball field” and “A baseball player at bat and a catcher in the dirt during a baseball game.” The first response is correct, but the other two answers include hallucinations of other players that aren’t seen in the image. The system has no idea what is actually in the picture as opposed to what is typical of roughly similar images. Any baseball fan would recognize that this was the pitcher who has just thrown the ball, and not the other way around—and although we expect that a catcher and a batter are nearby, they obviously do not appear in the image.

Credit: Bluesguy from NY/Flickr

A baseball player pitching a ball
on top of a baseball field.
 
A man throwing a baseball at a
pitcher on a baseball field.
 
A baseball player at bat and a
catcher in the dirt during a
baseball game

Likewise, DALL-E 2 couldn’t tell the difference between a red cube on top of a blue cube and a blue cube on top of a red cube. A newer version of the system, released in May, couldn’t tell the difference between an astronaut riding a horse and a horse riding an astronaut.

Credit: Imagen; From “Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding,” by Chitwan Saharia et al. Preprint posted online May 23, 2022

When systems like DALL-E make mistakes, the result is amusing, but other AI errors create serious problems. To take another example, a Tesla on autopilot recently drove directly towards a human worker carrying a stop sign in the middle of the road, only slowing down when the human driver intervened. The system could recognize humans on their own (as they appeared in the training data) and stop signs in their usual locations (again as they appeared in the trained images), but failed to slow down when confronted by the unusual combination of the two, which put the stop sign in a new and unusual position.

Unfortunately, the fact that these systems still fail to be reliable and struggle with novel circumstances is usually buried in the fine print. Gato worked well on all the tasks DeepMind reported, but rarely as well as other contemporary systems. GPT-3 often creates fluent prose but still struggles with basic arithmetic, and it has so little grip on reality it is prone to creating sentences like “Some experts believe that the act of eating a sock helps the brain to come out of its altered state as a result of meditation,” when no expert ever said any such thing. A cursory look at recent headlines wouldn’t tell you about any of these problems.

The subplot here is that the biggest teams of researchers in AI are no longer to be found in the academy, where peer review used to be coin of the realm, but in corporations. And corporations, unlike universities, have no incentive to play fair. Rather than submitting their splashy new papers to academic scrutiny, they have taken to publication by press release, seducing journalists and sidestepping the peer review process. We know only what the companies want us to know.

In the software industry, there’s a word for this kind of strategy: demoware, software designed to look good for a demo, but not necessarily good enough for the real world. Often, demoware becomes vaporware, announced for shock and awe in order to discourage competitors, but never released at all.

Chickens do tend to come home to roost though, eventually. Cold fusion may have sounded great, but you still can’t get it at the mall. The cost in AI is likely to be a winter of deflated expectations. Too many products, like driverless cars, automated radiologists and all-purpose digital agents, have been demoed, publicized—and never delivered. For now, the investment dollars keep coming in on promise (who wouldn’t like a self-driving car?), but if the core problems of reliability and coping with outliers are not resolved, investment will dry up. We will be left with powerful deepfakes, enormous networks that emit immense amounts of carbon, and solid advances in machine translation, speech recognition and object recognition, but too little else to show for all the premature hype.

Deep learning has advanced the ability of machines to recognize patterns in data, but it has three major flaws. The patterns that it learns are, ironically, superficial, not conceptual; the results it creates are difficult to interpret; and the results are difficult to use in the context of other processes, such as memory and reasoning. As Harvard computer scientist Les Valiant noted, “The central challenge [going forward] is to unify the formulation of … learning and reasoning.” You can’t deal with a person carrying a stop sign if you don’t really understand what a stop sign even is.

For now, we are trapped in a “local minimum” in which companies pursue benchmarks, rather than foundational ideas, eking out small improvements with the technologies they already have rather than pausing to ask more fundamental questions. Instead of pursuing flashy straight-to-the-media demos, we need more people asking basic questions about how to build systems that can learn and reason at the same time. Instead, current engineering practice is far ahead of scientific skills, working harder to use tools that aren’t fully understood than to develop new tools and a clearer theoretical ground. This is why basic research remains crucial.

That a large part of the AI research community (like those that shout “Game Over”) doesn’t even see that is, well, heartbreaking.

Imagine if some extraterrestrial studied all human interaction only by looking down at shadows on the ground, noticing, to its credit, that some shadows are bigger than others, and that all shadows disappear at night, and maybe even noticing that the shadows regularly grew and shrank at certain periodic intervals—without ever looking up to see the sun or recognizing the three-dimensional world above.

It’s time for artificial intelligence researchers to look up. We can’t “solve AI” with PR alone.

This is an opinion and analysis article, and the views expressed by the author or authors are not necessarily those of Scientific American.

参考译文
人工通用智能并不像你想象的那样即将到来
对于一般人来说,人工智能领域似乎取得了巨大的进步。根据新闻稿和一些更有感染力的媒体账号,OpenAI的《DALL-E 2》似乎可以从任何文本中创造出壮观的图像;另一个名为GPT-3的OpenAI系统可以谈论任何事情;今年5月,Alphabet旗下DeepMind发布了一个名为Gato的系统,该系统似乎能很好地完成Alphabet交给它的每一项任务。DeepMind的一位高管甚至吹嘘说,在追求具有人类智能的灵活性和智谋的人工通用智能(AGI)时,“游戏结束了!”埃隆·马斯克最近说,如果到2029年我们还没有人工通用智能,他会感到惊讶。别被骗了。机器也许有一天会和人类一样聪明,甚至更聪明,但游戏远没有结束。要制造出真正能够理解和思考周围世界的机器,还有大量的工作要做。我们现在真正需要的是少些装腔作势,多些基础研究。可以肯定的是,人工智能确实在某些方面取得了进展——合成图像看起来越来越逼真,语音识别通常可以在嘈杂的环境下工作——但我们距离通用的、人类级别的人工智能还很远,它可以理解文章和视频的真正含义,或处理意想不到的障碍和干扰。我们仍然受困于那些学术科学家(包括我自己)多年来一直指出的挑战:让人工智能变得可靠,让它能够应对不同寻常的情况。以最近著名的Gato为例,他被称为“万事通”,它是如何为一张投球手投掷棒球的图片配上文字说明的。系统会返回三种不同的答案:“棒球运动员在棒球场上投球”、“一个人在棒球场上向投球手投球”和“棒球比赛中,击球的棒球运动员和泥土中的接球手”。第一种回答是正确的,但其他两种回答包括了其他玩家的幻觉,而这些幻觉并没有出现在图像中。这个系统不知道图片里到底是什么,而不知道大致相似的图片里到底是什么。任何一个棒球迷都会认出这是投球手刚刚扔出的球,而不是反过来——尽管我们预计接球手和击球手就在附近,但他们显然没有出现在图像中。同样地,DALL-E 2也不能区分红色立方体放在蓝色立方体上和蓝色立方体放在红色立方体上的区别。今年5月发布的新版系统无法分辨宇航员骑马和马骑宇航员之间的区别。当像DALL-E这样的系统出错时,结果是有趣的,但其他人工智能错误会造成严重的问题。再举个例子,最近,一辆自动驾驶的特斯拉(Tesla)直接向一名拿着停车标志的工作人员开过去,只有在人类司机干预时才减速。该系统可以识别人类自身(就像他们在训练数据中出现的那样),并在他们通常的位置停止标识(就像他们在训练图像中出现的那样),但当遇到这两者的不寻常组合时,它无法减速,这使得停止标识处于一个不寻常的新位置。 不幸的是,这些系统仍然不可靠,并与新颖的环境斗争的事实通常被隐藏在小字印刷中。Gato在DeepMind报告的所有任务中都表现良好,但很少像其他当代系统那样出色。GPT-3常常能写出流畅的散文,但仍然难以完成基本的算术,而且它对现实的掌控力非常有限,容易写出诸如“一些专家认为,吃袜子的行为有助于大脑从冥想导致的改变状态中走出来”这样的句子,而从来没有专家说过这样的话。粗略地看一下最近的头条新闻,你不会发现这些问题。这里的次要情节是,人工智能领域最大的研究团队不再出现在学术界,而是出现在企业中——同行评议曾是学术界的头等大事。与大学不同,企业没有公平竞争的动机。他们没有把引人注目的新论文提交给学术审查,而是通过新闻稿的方式发表,引诱记者并避开同行评审过程。我们只知道公司想让我们知道的东西。在软件行业中,有一个词来形容这种策略:演示软件,设计成在演示中看起来不错,但在现实世界中不一定足够好。通常情况下,demo软件会变成雾软件,为了震慑竞争对手而发布,但却永远不会发布。不过,最终,鸡还是会自食其果的。冷聚变可能听起来很棒,但你在商场里还是买不到。人工智能的代价很可能是一个预期低迷的冬天。太多的产品,比如无人驾驶汽车、自动放射科医生和通用数字代理,已经被演示、公开,但从未交付使用。就目前而言,投资资金一直在兑现承诺(谁会不喜欢自动驾驶汽车呢?),但如果可靠性和应对异常值等核心问题得不到解决,投资就会枯竭。留给我们的将是强大的深度造假,排放大量碳的庞大网络,以及在机器翻译、语音识别和物体识别方面的坚实进展,但除了这些过早的炒作之外,其他方面的进展就太少了。深度学习提高了机器识别数据模式的能力,但它有三个主要缺陷。讽刺的是,它学到的模式是肤浅的,而不是概念性的;它产生的结果很难解释;这些结果很难用于其他过程,比如记忆和推理。正如哈佛大学计算机科学家Les Valiant指出的那样,“(未来的)核心挑战是统一……学习和推理的构想。”如果你连停车标志是什么都不懂,你是无法对付一个拿着停车标志的人的。就目前而言,我们陷入了“局部最低限度”(local minimum)的困境,企业追求的是基准,而不是基本的想法,勉力在已有的技术上做出微小的改进,而不是停下来问更根本的问题。我们需要更多的人提出一些基本的问题,比如如何构建既能学习又能推理的系统,而不是追求直接面向媒体的华丽演示。相反,当前的工程实践远远领先于科学技能,更努力地使用尚未被完全理解的工具,而不是开发新的工具和更清晰的理论基础。这就是为什么基础研究仍然至关重要。人工智能研究界的大部分人(比如那些高呼“游戏结束”的人)甚至没有看到这一点,这是令人心碎的。 想象一下,如果某个外星生物只通过俯视地面上的阴影来研究所有与人类的互动,并注意到,值得肯定的是,有些阴影比其他的更大,所有的阴影在晚上都会消失,甚至可能注意到阴影有规律地在特定的周期间隔内增长和收缩,而从来没有抬头看太阳,也没有认识到上面的三维世界。是时候让人工智能研究人员来看看了。我们不能仅靠PR来“解决AI”。这是一篇观点和分析文章,作者或作者所表达的观点不一定是《科学美国人》的观点。
您觉得本篇内容如何
评分

评论

您需要登录才可以回复|注册

提交评论

提取码
复制提取码
点击跳转至百度网盘