小程序
传感搜
传感圈

AI coding assistants leave developers “deluded” about software quality – study

2022-12-29
关注

Artificial intelligence-based coding assistants like GitHub’s Copilot leave developers “deluded” about the quality of their work, resulting in more insecure and buggy software, a new study from Stanford University has found. One AI expert told Tech Monitor it’s important to manage expectations when using AI assistants for such a task.

GitHub introduced its Copilot AI assistant in 2021 and it is widely used by developers to "improve productivity" (Photo: Postmodern Studio/Shutterstock)
GitHub introduced its Copilot AI assistant in 2021 and it is widely used by developers to “improve productivity” (Picture courtesy of Postmodern Studio/Shutterstock)

The study involved a group of 47 developers, 33 of whom had access to an AI assistant while writing code, while 14 were in a control group flying solo. They had to perform five security-related programming tasks including ones to encrypt or decrypt a string using a symmetric key. They all had access to a web browser to search for help but only 33 had the AI assistant.

AI assistant tools for coding and other tasks are becoming more popular, with Microsoft-owned GitHub launching Copilot as a technical preview in 2021 as a way to “improve developer productivity”.

In its own research published in September this year, GitHub found that it was making developers more productive. With 88% reporting themselves as being more productive and 59% less frustrated when coding. The main benefits were put down to becoming faster with repetitive tasks and faster completion of code lines.

Companies Intelligence

View All

Reports

View All

Data Insights

View All

The researchers from Stanford wanted to find out whether users "write more insecure code with AI assistants" and found this to be the case. They said that those using assistants are "delusional" about the quality of that code.

The team wrote in their paper: “We observed that participants who had access to the AI assistant were more likely to introduce security vulnerabilities for the majority of programming tasks, yet also more likely to rate their insecure answers as secure compared to those in our control group.”

There is a solution to the problem. “Additionally, we found that participants who invested more in the creation of their queries to the AI assistant, such as providing helper functions or adjusting the parameters, were more likely to eventually provide secure solutions.”

Only three programming languages were used in the project; Python, C and Verilog. It involved a relatively small number of participants with varying levels of experience including undergraduate students and industry professionals using a purpose-built app that was monitored by the administrators.

Content from our partners

How adopting B2B2C models is enabling manufacturers to get ever closer to their consumers

How adopting B2B2C models is enabling manufacturers to get ever closer to their consumers

Technology and innovation can drive post-pandemic recovery for logistics sector

Technology and innovation can drive post-pandemic recovery for logistics sector

How to engage in SAP monitoring effectively in an era of volatility

How to engage in SAP monitoring effectively in an era of volatility

The first prompt involved writing in Python and those writing with help of the AI were more likely to write insecure or incorrect code. In total 79% of the control group without AI help gave a correct answer, whereas just 67% of those with the AI got it correct.

View all newsletters Sign up to our newsletters Data, insights and analysis delivered to you By The Tech Monitor team

AI coding assistants: use with caution

It got worse in terms of the security of the code being created, as those in the AI group were "significantly more likely to provide an insecure solution" or use trivial ciphers to encrypt and decrypt strings. They were also less likely to conduct authenticity checks on the final value to ensure the process worked as expected.

Authors Neil Perry, Megha Srivastava, Deepak Kumar, and Dan Boneh, wrote that the results "provide caution that inexperienced developers may be inclined to readily trust an AI assistant’s output, at the risk of introducing new security vulnerabilities. Therefore, we hope our study will help improve and guide the design of future AI code assistants.”

Peter van der Putten, director of the AI Lab at software vendor Pegasystems said despite being on a small scale, the study was “very interesting” and produced results that can inspire further research into the use of AI assistants in code and other areas. “It also aligns with some of our broader research on reliance on AI assistants in general," he said.

He warned that users of AI assistants should approach trust in the tool in a gradual manner, by not overly relying on it and accepting its limitations. “The acceptance of a technology isn’t just determined by our expectation of quality and performance, but also by whether it can save us time and effort. We are inherently lazy creatures," he said. “In the grand scheme of things I am positive about the use of AI assistants, as long as user expectations are managed. This means defining best practices on how to use these tools, and potentially also additional capabilities to test for the quality of code."

Read more: Compute power is becoming a bottleneck for AI development. Here's how you clear it.

Topics in this article : AI

参考译文
人工智能编码助手让开发人员对软件质量“迷惑”——研究
斯坦福大学的一项新研究发现,基于人工智能的编码助手,比如GitHub的Copilot,会让开发人员对他们的工作质量“迷惑”,导致软件更不安全、更有漏洞。一位人工智能专家告诉Tech Monitor,在使用人工智能助手完成这类任务时,管理预期很重要。这项研究涉及了47名开发人员,其中33人在编写代码时可以使用人工智能助手,而对照组中有14人独自飞行。他们必须执行五项与安全相关的编程任务,包括使用对称密钥加密或解密字符串。他们都可以使用网络浏览器搜索帮助,但只有33人拥有人工智能助手。用于编码和其他任务的人工智能助手工具正变得越来越受欢迎,微软旗下的GitHub将在2021年推出Copilot作为技术预览,以“提高开发人员的生产力”。在今年9月发布的研究中,GitHub发现,它让开发人员的工作效率更高。88%的人表示自己在编码时效率更高,59%的人表示自己不那么沮丧。主要的好处是可以更快地完成重复任务和更快地完成代码行。来自斯坦福大学的研究人员想要弄清楚用户是否"编写更不安全的代码与AI助手"发现情况就是这样。他们说那些使用助手的人都是妄想狂"关于代码的质量。该团队在论文中写道:“我们观察到,与对照组的参与者相比,能够访问人工智能助手的参与者更有可能在大多数编程任务中引入安全漏洞,但也更有可能将不安全的答案评为安全。”这个问题有一个解决办法。“此外,我们发现,在向人工智能助手创建查询方面投入更多的参与者,例如提供助手功能或调整参数,最终更有可能提供安全的解决方案。”该项目只使用了三种编程语言;Python, C和Verilog。研究涉及的参与者数量相对较少,他们有不同程度的经验,包括本科生和行业专业人士,他们使用的是由管理员监控的专门开发的应用程序。第一个提示涉及用Python编写,而那些在人工智能的帮助下编写的代码更有可能不安全或不正确。在没有人工智能帮助的对照组中,共有79%的人给出了正确答案,而有人工智能帮助的对照组中,只有67%的人答对了。在创建代码的安全性方面,情况变得更糟,因为AI组的人"明显更有可能提供不安全的解决方案"或者使用简单的密码对字符串进行加密和解密。他们也不太可能对最终值进行真实性检查,以确保流程按预期工作。作者Neil Perry, Megha Srivastava, Deepak Kumar和Dan Boneh写道,研究结果"提供了警告,没有经验的开发人员可能倾向于轻易信任AI助手的输出,冒着引入新的安全漏洞的风险。因此,我们希望我们的研究将有助于改进和指导未来AI代码助手的设计。软件供应商Pegasystems的人工智能实验室主任彼得·范德·普特顿表示,尽管规模很小,但这项研究“非常有趣”,得出的结果可以激发人们进一步研究人工智能助手在代码和其他领域的使用。“这也与我们对AI助手的总体依赖的一些更广泛的研究相一致,"他说。 他警告说,人工智能助手的用户应该以渐进的方式获得对该工具的信任,不要过度依赖它,并接受它的局限性。“对一项技术的接受程度不仅取决于我们对质量和性能的期望,还取决于它是否能节省我们的时间和精力。我们天生就是懒惰的生物,"他说。“总的来说,我对人工智能助手的使用持积极态度,只要用户期望得到管理。这意味着定义如何使用这些工具的最佳实践,以及潜在的额外功能来测试代码质量。
您觉得本篇内容如何
评分

评论

您需要登录才可以回复|注册

提交评论

提取码
复制提取码
点击跳转至百度网盘