Cerebras AI Smashes AWS, Writes Code 75x Faster Using World's Biggest Chip!

Home » Technology » Cerebras AI Smashes AWS, Writes Code 75x Faster Using World’s Biggest Chip!

Cerebras video shows AI writing code 75x faster than world's fastest AI GPU cloud — world's largest chip beats AWS's fastest in head-to-head comparison

Cerebras claims its technology is 75 times faster than leading hyperscaler GPUs.

Cerebras has achieved a processing speed of 969 tokens per second with Meta’s Llama 3.1 405B large language model, which is 75 times quicker than the fastest AI service equipped with GPUs from Amazon Web Services.

This impressive performance was recorded on the Cerebras Inference cloud AI service, which leverages the company’s third-generation Wafer Scale Engines instead of traditional GPUs from Nvidia or AMD. From its inception in August, the Cerebras Inference service has been touted as being significantly faster at generating tokens, which are the basic components of responses from a large language model (LLM). Initially, the service was reported to be about 20 times speedier than Nvidia GPUs provided by cloud services like Amazon Web Services for smaller models such as Llama 3.1 8B and Llama 3.1 70B.

However, since July, Meta has introduced the Llama 3.1 405B model, which is significantly more complex, containing 405 billion parameters compared to the 70 billion parameters of the Llama 3.1 70B. Cerebras has demonstrated that its Wafer Scale Engine processors can handle this vast LLM at what it describes as “instant speed,” delivering a token rate of 969 per second and achieving a time-to-first-token in just 0.24 seconds—a record-setting performance, not only for Cerebras technology but also for the Llama 3.1 405B model itself.

When compared to Nvidia GPUs available through AWS, Cerebras Inference operated 75 times faster; it was even 12 times quicker than the fastest Nvidia GPU setup from Together AI. Even SambaNova, a competitor in AI processor design, was outperformed by a factor of six by Cerebras Inference.

To put this into perspective, Cerebras prompted both Fireworks (the fastest AI cloud service using GPUs) and its own Inference service to develop a chess program in Python. It took Cerebras Inference merely three seconds to complete the task, while Fireworks required 20 seconds.

Here’s a glimpse of what instant 405B performance looks like: Cerebras versus the fastest GPU cloud: pic.twitter.com/d49pJmh3yTNovember 18, 2024

Cerebras announced that Llama 3.1 405B on their system is the world’s fastest frontier model—12 times quicker than GPT-4o and 18 times faster than Claude 3.5 Sonnet. This achievement is attributed to Meta’s open approach paired with Cerebras’s cutting-edge inference technology, allowing Llama 3.1-405B to operate more than 10 times faster than other leading closed frontier models.

Even when the query size was increased from 1,000 tokens to 100,000 tokens (a prompt consisting of at least a couple thousand words), Cerebras Inference managed a speed of 539 tokens per second. Among the five other services capable of handling this large task, the next best achieved only 49 tokens per second.

Cerebras also highlighted that a single unit of its second-generation Wafer Scale Engine outperformed the Frontier supercomputer by 768 times in a molecular dynamics simulation. Frontier was the world’s fastest supercomputer until the recent launch of the El Capitan supercomputer, which is powered by 9,472 AMD Epyc CPUs.

Moreover, the Cerebras chip surpassed the Anton 3 supercomputer’s performance by 20%, a notable achievement considering Anton 3 was specifically designed for molecular dynamics simulations; it also marked the first time a computer reached over one million simulation steps per second.

Similar Posts

See also

Intel vs. TSMC Process Nodes Battle: Speed or Density – Which Wins?

Musk Teases Grok 3 Dominance, Full Release Just Around the Corner!

Data Hoarders Rush to Save Vanishing U.S. Federal Websites!

US Government Urges TSMC, Intel to Launch Joint Venture: Inside the Mega Deal!

White House Reviews CHIPS Act Awards, Delays Likely in Payments

Rumors Spark Fear in Taiwan Over Potential Loss of ‘Silicon Shield’ to TSMC and Intel

Leave a Comment Cancel reply

Cerebras AI Smashes AWS, Writes Code 75x Faster Using World’s Biggest Chip!

Similar Posts

See also

Intel vs. TSMC Process Nodes Battle: Speed or Density – Which Wins?

Musk Teases Grok 3 Dominance, Full Release Just Around the Corner!

Data Hoarders Rush to Save Vanishing U.S. Federal Websites!

US Government Urges TSMC, Intel to Launch Joint Venture: Inside the Mega Deal!

White House Reviews CHIPS Act Awards, Delays Likely in Payments

Rumors Spark Fear in Taiwan Over Potential Loss of ‘Silicon Shield’ to TSMC and Intel

Leave a Comment Cancel reply

Contact details

Categories

Useful links