小程序
传感搜
传感圈

Compute power is becoming a bottleneck for developing AI. Here’s how you clear it.

2022-12-16
关注

In less than a week, OpenAI’s first chatbot tool ChatGPT went viral, with billions of requests being made to put the much-hyped system through its paces. Interest was so high that the company had to implement traffic management tools including a queuing system and slowing down queries in order to cope with demand, and the incident highlights the vast amounts of compute power required to sustain large language models like GPT-3, the system on which ChatGPT is built.

OpenAI has been forced to introduce a queuing system and other traffic shaping measure due to demand for ChatGPT
OpenAI has been forced to introduce a queuing system and other traffic shaping measures due to demand for ChatGPT

As this and other types of advanced AI become more commonplace and are put to use by businesses and consumers, the challenge will be to maintain sufficient compute capacity to support them. But this is easier said than done, and one expert told Tech Monitor a bottleneck is already being created by a lack of compute power holding back AI development. Turning to supercomputers, or entire new hardware architectures, could be potential solutions

The scale of compute power required to run ChatGPT

A large language model such as GPT-3 requires a significant amount of energy and computing power for its initial training. This is in part due to the limited memory capacity of even the largest GPUs used to train the systems, requiring multiple processors to be running in parallel.

Even querying a model using ChatGPT requires multi-core CPUs if done in real-time. This has led to processing power becoming a major barrier limiting how advanced an AI model can become.

Companies Intelligence

View All

Reports

View All

Data Insights

View All

GPT-3 is one of the largest ever created with 175bn parameters and, according to a research paper by Nvidia and Microsoft Research “even if we are able to fit the model in a single GPU, the high number of compute operations required can result in unrealistically long training times” with GPT-3 taking an estimated 288 years on a single V100 Nvidia GPU.

Using processors running in parallel is the most common solution to speed things up but it has its limitations, as beyond a certain number of GPUS the per-GPU batch size becomes too small and increasing numbers further becomes less viable while increasing costs.

Hardware has already become a bottleneck for AI

Professor Mark Parsons, director of EPCC, the supercomputing centre at the University of Edinburgh told Tech Monitor a realistic limit is about 1,000 GPUs and the most viable way to handle that is through a dedicated AI supercomputer. The problem, he said, is even if the GPUs can become faster the bottleneck will still exist as the interconnectors between GPUs and between systems isn’t fast enough.

“Hardware has already become a bottleneck for AI,” he declared. “After you have trained a subset of data on one of the GPUs you have to bring the data back, share it out and do another training session on all GPUs which takes huge amounts of network bandwidth and work off GPUs.”

Content from our partners

Technology and innovation can drive post-pandemic recovery for logistics sector

Technology and innovation can drive post-pandemic recovery for logistics sector

How to engage in SAP monitoring effectively in an era of volatility

How to engage in SAP monitoring effectively in an era of volatility

How to turn the evidence hackers leave behind against them

How to turn the evidence hackers leave behind against them

“GPT and other large language models are being continuously developed and some of the shortcomings in training in parallel are being solved,” Parsons adds. “I think the big challenge is a supercomputing challenge which is how we improve data transfer between GPU servers. This isn’t a new problem and one that we’ve had in supercomputing for some time, but now AI developers are turning to supercomputers they are realising this issue”

View all newsletters Sign up to our newsletters Data, insights and analysis delivered to you By The Tech Monitor team

He isn’t sure how quickly the speed of interconnects will be able to catch up as the fastest in the works have a throughput of about 800 Gbps which “is not fast enough today”.

“Computer networking speeds are improving but they are not increasing at the speed AI people want them to as the models are growing at a faster rate than the speed is increasing,” he says. “All people selling high-performance interconnects have roadmaps, have done designs and know where we are going in next five years – but I don’t know if the proposed 800Gbps will be enough to solve this problem as the models are coming with trillions upon trillions of parameters.”

He said it won’t be a major problem as long as the AI developers continue to improve the efficiency of their algorithms, if they don’t manage to do that then there “will be a serious problem” and delays until the hardware can catch up with the demands of the software.

Will new architectures be needed to cope with AI?

OpenAI’s upcoming large language model, GPT-4, is due to be released next. While rumoured to be an order of magnitude larger than GPT-3 in terms of power, is also thought to be aiming to deliver this increased ability for the same server load.

Mirco Musolesi, professor of computer science at University College London told Tech Monitor said developing large language models further will require improved software and better infrastructure. A combination of the two, plus hardware not yet developed, will end the bottleneck, he believes.

“The revolution is also architectural, since the key problem is the distribution of computation in clusters and farms of computational units in the most efficient way,” Professor Musolesi says. “This should also be cost-effective in terms of power consumption and maintenance as well.

“With the current models, the need for large-scale architectures will stay there. We will need some algorithmic breakthroughs, possibly around model approximation and compression for very large models. I believe there is some serious work to be done there.”

The problem, he explained, is that AI isn’t well-served by current computing architectures as they require certain types of computations, including tensor operations, that require specialist systems and the current supercomputers tend to be more general purpose.

New AI supercomputers, such as the ones in development by Meta, Microsoft and Nvidia, will solve some of these problems “but this is only one aspect of the problem,” said Musolesi. “Since the models do not fit on a single computing unit, there is the need of building parallel architectures supporting this type of specialised operations in a distributed and fault-tolerant way.  The future will be probably about scaling the models further and, probably, the “Holy Grail” will be about “lossless compression” of these very large models”.

This will come at a huge cost and to reach the “millisecond” speed a search engine can deliver thousands of results, AI hardware and software will “require substantial further investment”.

He says new approaches will emerge including through new mathematical models requiring additional types of operations not yet known, although Musolesi added that “current investments will also steer the development of these future models, which might be designed in order to maximise the utilisation of the computational infrastructures currently under development – at least in the short term”.

Read more: Will ChatGPT be used to write malware?

Topics in this article : AI , ChatGPT , GPT-3 , OpenAI

参考译文
计算能力正在成为发展人工智能的瓶颈。以下是如何清除它的方法。
在不到一周的时间里,OpenAI的第一个聊天机器人工具ChatGPT迅速传播开来,数十亿个请求被提出,以测试这个被大肆宣传的系统的性能。用户的兴趣如此之高,以至于该公司不得不实施流量管理工具,包括排队系统和放慢查询速度,以应对需求。这次事件凸显了维持像GPT-3这样的大型语言模型所需的巨大计算能力,ChatGPT正是基于gpt系统构建的。随着这种和其他类型的高级人工智能变得越来越普遍,并被企业和消费者使用,挑战将是保持足够的计算能力来支持它们。但这说起来容易做起来难,一位专家告诉Tech Monitor,由于缺乏计算能力,阻碍了人工智能的发展,已经形成了一个瓶颈。转向超级计算机或全新的硬件架构可能是潜在的解决方案。像GPT-3这样的大型语言模型在初始训练时需要大量的能量和计算能力。部分原因是,即使是用于训练系统的最大gpu的内存容量也有限,需要多个处理器并行运行。即使使用ChatGPT查询模型,如果是实时的,也需要多核cpu。这导致处理能力成为限制人工智能模型先进程度的主要障碍。GPT-3是迄今为止创建的最大的模型之一,拥有1750亿个参数,根据英伟达和微软研究院的一篇研究论文,“即使我们能够在单个GPU中适合该模型,所需的大量计算操作也可能导致不切实际的长时间训练”,GPT-3在单个V100英伟达GPU上大约需要288年。使用并行运行的处理器是最常见的加快速度的解决方案,但它有其局限性,因为超过一定数量的gpu,每个gpu批量大小就会变得太小,进一步增加数量在增加成本的同时变得不太可行。爱丁堡大学超级计算中心EPCC主任马克·帕森斯教授告诉《科技观察报》,现实的限制是大约1000个gpu,而解决这个问题最可行的方法是通过专用的人工智能超级计算机。他说,问题是,即使gpu可以变得更快,瓶颈仍然存在,因为gpu之间和系统之间的互连速度不够快。“硬件已经成为人工智能的瓶颈,”他宣称。“在一个gpu上训练了一组数据后,你必须把数据带回来,共享它,并在所有gpu上进行另一次训练,这会占用大量的网络带宽,并消耗gpu。”帕森斯补充说:“GPT和其他大型语言模型正在不断发展,并行训练中的一些缺点正在得到解决。”“我认为最大的挑战是超级计算的挑战,即我们如何改善GPU服务器之间的数据传输。这不是一个新问题,我们在超级计算领域已经有一段时间了,但现在人工智能开发人员正在转向超级计算机,他们意识到这个问题。”他不确定互连速度有多快才能赶上,因为目前最快的互连速度约为800 Gbps,这“在今天还不够快”。他表示:“计算机网络的速度正在提高,但没有达到人工智能人士希望的速度,因为模型的增长速度超过了速度的增长速度。”“所有销售高性能互连网络的人都有路线图,都做过设计,知道未来五年我们将走向何方——但我不知道拟议中的800Gbps是否足以解决这个问题,因为这些模型具有数万亿的参数。” 他说,只要人工智能开发人员继续提高算法的效率,这就不会是一个大问题,如果他们不能做到这一点,那么就会出现“严重的问题”,并推迟到硬件赶上软件的需求。OpenAI即将发布的大型语言模型GPT-4将于下一个发布。虽然谣传在功率方面比GPT-3大一个数量级,但也被认为是为了在相同的服务器负载下提供这种增强的能力。伦敦大学学院计算机科学教授Mirco Musolesi告诉Tech Monitor,进一步开发大型语言模型需要改进软件和更好的基础设施。他相信,两者结合,再加上尚未开发的硬件,将终结瓶颈。“这场革命也是建筑上的,因为关键问题是以最有效的方式在集群和计算单元农场中分配计算,”穆莱索西教授说。“这在电力消耗和维护方面也应该具有成本效益。”对于当前的模型,对大规模架构的需求将保持不变。我们需要一些算法上的突破,可能是针对非常大的模型的模型近似和压缩。我相信在那里有一些严肃的工作要做。他解释说,问题在于,目前的计算架构并没有很好地服务于人工智能,因为它们需要特定类型的计算,包括张量运算,这需要专业的系统,而目前的超级计算机往往更通用。新的人工智能超级计算机,如Meta、微软和英伟达正在开发的超级计算机,将解决其中的一些问题,“但这只是问题的一个方面,”穆olesi说。“由于模型不适用于单个计算单元,因此需要构建并行架构,以分布式和容错的方式支持这种类型的专门操作。未来很可能会进一步扩展模型,“圣杯”很可能是关于这些非常大的模型的“无损压缩”。这将带来巨大的成本,要达到搜索引擎可以提供数千条结果的“毫秒”速度,人工智能硬件和软件将“需要进一步大量投资”。他说,新的方法将会出现,包括通过新的数学模型,需要尚不知道的额外类型的操作,尽管Musolesi补充说,“目前的投资也将引导这些未来模型的发展,这些模型可能是为了最大化利用目前正在开发的计算基础设施——至少在短期内”。
您觉得本篇内容如何
评分

评论

您需要登录才可以回复|注册

提交评论

提取码
复制提取码
点击跳转至百度网盘