Right now, major tech firms are clamouring to replicate the runaway success of ChatGPT, the generative AI chatbot developed by OpenAI using its GPT-3 large language model. Much like potential game-changers of the past, such as cloud-based Software as a Service (SaaS) platforms or blockchain technology (emphasis on potential), established companies and start-ups alike are going public with LLMs and ChatGPT alternatives in fear of being left behind.
While many of these will succeed – some in the mass market and others with niche applications – many more will likely fail as the market consolidates.
Who, then, are the companies in good stead to challenge OpenAI?
Table of contents
-
Companies with Large Language Model projects
- Google – LaMDA
- AI21 – Jurassic-2
- Anthropic – Claude
- Baidu – ERNIE 3.0
- Nvidia – DGX AI
- DeepMind – Chinchilla
- Meta – LLaMA
Companies with Large Language Model projects
Google – LaMDA
Google’s LaMDA has attracted the most attention from mainstream observers of any LLM outside of GPT-3, but for not quite the same reasons.
Months before ChatGPT exploded into national headlines in late 2022, LaMDA was proving controversial after Google engineer Blake Lemoine was suspended for claiming – falsely as became evident – that it had developed sentience.
In reality, the LaMDA LLM operates similarly to its main competitor, except that it has fewer parameters at 137 billion compared to 175 billion for GPT-3.5, which was used to train ChatGPT.
LaMDA is also the bedrock of Google’s chatbot competitor, named Bard, which the search giant is currently testing for search with select users. Bard had an inauspicious start, however, as it presented a factual error during a launch event.
AI21 – Jurassic-2
Israel-based start-up AI21, while less well-known than its rival OpenAI, is a serious challenger in the market. The company created the Jurassic-2 large language model in 2021 with a similar number of parameters to GPT-3.5 – 178 billion compared to 175 billion – and customisation capabilities.
Content from our partners
Banks must better balance compliance with customer outreach
How one manufacturer transformed data capabilities by moving to the cloud
Why fashion’s future lies in the cloud
March 2023 then saw the release of Jurassic-2, which focuses on optimised performance as opposed to size. According to AI21, the smallest version of Jurassic-2 outperforms even the largest version of its predecessor. It will also contain a grammatical correction API and text segmentation capabilities.
View all newsletters Sign up to our newsletters Data, insights and analysis delivered to you By The Tech Monitor team
Users of AI21 studio can train their own versions of the LLM with as few as 50-100 training examples, which then become available for exclusive use.
AI21 also deployed Jurassic-1, and now Jurassic-2 to underpin its WordTune Spices chatbot, which distinguishes itself as a ChatGPT alternative by the use of live data retrieval and the citation of sources in its formulations. Given the risks of factual error and plagiarism associated with LLM chatbots, this is a significant advantage in an increasingly competitive field.
Anthropic – Claude
Founded by former OpenAI employees, Anthropic is fast making waves as a rival to its quasi-progenitor.
The generative AI company has launched its own large language model, Claude, whose ChatGPT alternative boasts what it calls “constitutional AI”. In effect, the model is designed to act according to programmed principles (i.e. its ‘constitution’) as opposed to ChatGPT, which is prohibited from answering certain controversial or dangerous queries.
Much like Microsoft’s investment in OpenAI, Google has invested $300m into Anthropic for a 10% stake in the company.
Baidu – ERNIE 3.0
Baidu – China’s answer to Google – is looking to combat its long-term struggles in the face of rival Tencent with its heavy investment in AI.
The team at Baidu has expanded its ERNIE 3.0 large language model into a new version called ERNIE 3.0 Titan. While its predecessor had just 10 billion parameters, Titan’s PaddlePaddle platform operates on 260 billion.
Titan’s creators claim that it is the “largest dense pre-trained model so far” and that it outperforms state-of-the-art models on natural language processing (NLP) tasks.
Nvidia – DGX AI
Hardware and software supplier Nvidia is currently core to the operation of ChatGPT, with an estimated 10,000 of the company’s GPUs used to train the chatbot and a predicted 30,000 to be used in future.
This dynamic could be upended, however, as Nvidia CEO Jensen Huang announced in February 2023 that the firm plans to make its DGX AI supercomputer available via the cloud.
Already accessible through Oracle Cloud Infrastructure and Microsoft Azure, the AI supercomputer will have the capacity to allow customers to train their own large language models.
Nvidia has seen a financial boost as companies such as Google and Microsoft look to it for the GPUs necessary for training.
DeepMind – Chinchilla
British AI company and Alphabet subsidiary Deepmind, famous for its AlphaGo program, is investing heavily in large language model research and development. Deepmind has iterated on multiple LLMs, including Gopher, Chinchilla and the RETRO system, which combines an LLM with an external database.
This experimentation is leading the way in more targeted and energy-efficient types of LLM – Chinchilla has just 70 billion parameters, as opposed to others with double, triple or even more than that, yet outperforms the larger Gopher at certain tasks. Likewise for the 7.5 billion-parameter RETRO, whose external database allows it to outperform vastly larger models.
Meta – LLaMA
Not content to invest in the metaverse, Meta has also entered the LLM space with its LLaMA model. Mark Zuckerberg’s company does not yet have a publicly available ChatGPT alternative but it is in development.
Unlike many others, the 65-billion parameter LLM has been made open source (upon request, crucially) with the intention of knowledge sharing and crowdsourcing bug fixes.
But just a week after it launched, a torrent for the LLM found its way to the wider internet via a 4Chan leak, prompting fears that such unfettered access could be used for phishing and other cybercrime activities.
Read more: This is how GPT-4 will be regulated