The race for AI supremacy is once again accelerating as xAI CEO Elon Musk announced via Twitter that his company successfully brought its Colossus AI training cluster, which Musk bills as the world’s “most powerful,” online over the weekend.
This weekend, the @xAI team brought our Colossus 100k H100 training cluster online. From start to finish, it was done in 122 days.
Colossus is the most powerful AI training system in the world. Moreover, it will double in size to 200k (50k H200s) in a few months.
Excellent…
— Elon Musk (@elonmusk) September 2, 2024
“This weekend, the @xAI team brought our Colossus 100k H100 training cluster online. From start to finish, it was done in 122 days. Colossus is the most powerful AI training system in the world. Moreover, it will double in size to 200k (50k H200s) in a few months. Excellent work by the team, Nvidia and our many partners/suppliers,” Musk wrote in a post on X.
Musk’s “most powerful” claim is based on the number of GPUs employed by the system. With 100,000 Nvidia H100s driving it, Colossus is estimated to be larger than any other AI system developed to date.
Musk began purchasing tens of thousands of GPUs in April 2023 to accelerate his company’s AI efforts, shortly after penning an open letter calling for an industrywide, six month “pause” on AI development. In March of that year, Musk claimed that the company would leverage AI to “detect & highlight manipulation of public opinion” on Twitter, though the GPU supercomputer will likely also be leveraged to train its large language model (LLM), Grok.
Grok was introduced by xAI in 2023 in response to the success of rivals like ChatGPT, Gemini, Llama 3.1, and Claude. The company released the updated Grok-2 as a beta in August. “We have introduced Grok-2, positioning us at the forefront of AI development,” xAI wrote in a recent blog post. “Our focus is on advancing core reasoning capabilities with our new compute cluster. We will have many more developments to share in the coming months.”
Musk claims that he can also develop Tesla into “a leader in AI & robotics,” however, a recent report from CNBC suggests that Musk has been diverting shipments of Nvidia’s highly sought-after GPUs from the electric automaker to xAI and Twitter. Doing so could delay Tesla’s efforts to install the compute resources needed to develop its autonomous vehicle technology and the Optimus humanoid robot.
“Elon prioritizing X H100 GPU cluster deployment at X versus Tesla by redirecting 12k of shipped H100 GPUs originally slated for Tesla to X instead,” an Nvidia memo from December obtained by CNBC reads. “In exchange, original X orders of 12k H100 slated for [January] and June to be redirected to Tesla.”