The AI Daily Brief: Artificial Intelligence News and Analysis - Groq is 10x Faster than ChatGPT and Gemini
Episode Date: February 20, 2024Alongside Gemini 1.5's massive new context window, and Sora's mindblowing video generation, Groq has come along to redefine how fast we think LLMs can be. NLW explores people's reactions and the impli...cations for new use cases. INTERESTED IN THE AI EDUCATION BETA? Learn more and sign up https://bit.ly/aibeta Today's Sponsors: Notion - Notion AI. Knowledge, answers, ideas. One click away. - https://notion.com/aibreakdown ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/
Transcript
Discussion (0)
Today on the AI breakdown, we're talking about grok. That's grok with a Q, not grok with a K, and at nearly 500 tokens a second, it is redefining how fast LLMs can be.
Before that on the brief, SoftBank is exploring a $100 billion AI chip project.
The AI breakdown is a daily podcast and video about the most important news and discussions in AI.
Go to Breakdown.network for more information about our YouTube, or Discord, and our newsletter.
Welcome back to the AI breakdown brief, all the AI headline news you need in around five minutes.
There has been a lot of scuttlebutt in the news around AI chip efforts.
OpenAI's Sam Altman, of course, has recently redefined the term ambition with his interest
in exploring a $5 to $7 trillion initiative to build a network of chip fabrication plants around
the world, but also with a focus on the United States.
Perhaps in the realm of a little bit more realistic, if still intensely ambitious,
Bloomberg reports that SoftBanks Masayoshi Sun is exploring a $100 billion AI chip project.
Now, the ups and downs of SoftBank are, of course, at this point, legendary, but they are certainly
currently on an upswing, which has been largely driven by their 90% share of ARM holdings,
the AI chip designer, which has seen enormous stock price increases this year.
Bloomberg writes that this new $100 billion chip venture would complement Arm, and is codenamed
Izanagi.
Apparently, one of the scenarios that SoftBank is imagining would see them putting in about
$30 billion, with another $70 billion coming from institutions in the Middle East.
At the moment, SoftBank has around $41 billion in cash and cash equivalence on hand,
which would mean that this would represent a major part of their liquid assets being invested
into this new project.
Over the last 10 trading days, as Arm has increased by more than 80% in the markets,
soft bank shares have gained about 30%, and after this news broke,
soft bank stock was up another 3%.
Now, at the moment, a $100 billion chip project would represent a little less than a fifth
of the global semiconductor market, but I don't think anyone on the planet, especially not
people who are paying attention to AI, thinks that that market is going to stay this size for very
long. Now, moving on from chips into the realm of data, last year in April, Reddit started to posture
like it was going to take a more harsh stance towards companies that were scraping its data to train
their AI models. Now, around 10 months later, they have actually signed a $60 million annual deal with an
as-yet unnamed AI company to allow that company to train on Reddit content. One thing we don't know is
whether the deal is exclusive, but the contact who gave Bloomberg the news speculated that more
likely was that the contract would serve as a model for future agreements with other AI companies
as well. Now, from a Reddit standpoint, this is something they wanted to get done before a potential
IPO, which could happen as soon as next month. And obviously from the larger AI industry
standpoint, any of these types of agreements right now, as legal battles around copyright and fair
use are being fought in the courts, are going to be influential in shaping how the industry
moves next when it comes to proprietary sources of data. Next, we have a little bit of
follow-up around SORA. If you've been listening to my shows recently, you will know that Sora
has been the major topic of conversation over the last few days, really ever since OpenAI announced it,
but some people aren't quite as impressed. Jan LeCoon, who is of course the head of META's
AI department, said that if Open AI's goal is really to simulate the world, that Sora's approach
is ill-suited for that. He said, modeling the world for action by generating pixels is wasteful
and doomed to failure. Writes the Decoder, there has been a historic debate about the merits of
generative versus discriminative classification methods, with generative methods considered more
difficult and less effective. Lekoon believes that generative models for sensory inputs will fail
because it is too difficult to deal with the prediction uncertainty of high-dimensional continuous
sensory inputs. Basically, he's saying that while generative models work for text, because
there are a finite number of symbols, uncertainty is easier dealt with in that context.
Sensory input, on the other hand, just is a huge additional level of complexity. It will perhaps
surprise you not at all to know that Lacoon and Metta have their own approach to this problem,
which they're calling the video join-embedding predictive architecture or VJPA.
Again from the decoder, the model predicts complex interactions
and interprets them by adding hidden parts of video to convey the dynamics of objects and interactions
to the AI.
VJPA focuses on predictions in a broader conceptual space similar to human cognitive image processing.
This architecture allows VJPA to adapt to different tasks by adding a small task-specific layer
rather than retraining the entire model.
Elon Musk also had some words for OpenAI Sora, saying on Twitter,
where Tesla video generation exceeds OpenAI,
is that it predicts extremely accurate physics. That is essential for self-driving.
So again, what we have here is another critique, not of Sora's ability necessarily to make
incredible-looking videos, but instead how accurate it truly is as a representation of the
real world and its ability to be a world simulator, which seems, of course, to be the ultimate
goal in OpenAI's pursuit of AGI. Now, staying on the theme of Elon or at least electric
vehicles for just a minute, a Chinese EV maker X-Peng has announced that it would hire
4,000 people and invest millions in AI in a strategy that is contrasting with other Chinese and global
EV makers who are, instead of investing more, currently racing to cut their own costs.
Microsoft continues its streak of investing in European countries, announcing an AI infrastructure
bid in Spain coming along with a $2.1 billion investment. This of course follows our recent announcement
of a $3.45 billion, AI-focused investment in Germany, and will be centered around AI and cloud
infrastructure. Lastly, today, one that I'm only just starting to see talked about on
Twitter a little bit. The University of Pennsylvania's Penn Engineering Today wrote a blog post at the end
of last week called New Chip opens door to AI computing at light speed. The piece begins,
Penn Engineers have developed a new chip that uses light waves rather than electricity to perform the
complex math essential to training AI. The chip has the potential to radically accelerate the processing
speed of computers while also reducing their energy consumption. They call this a silicon photonic or
SIPH chip, and it's based on recent research around manipulating materials at the nanoscale to perform
mathematical computations using light. And so we closed this brief, basically where we began,
with the continued focus on AI chips. In many ways, these two bookending stories represent the spectrum
of what we're seeing right now. On the one hand, people trying to solve the compute access
problem by throwing money at it and just building out more infrastructure, versus on the other
hand, thinking in fundamentally new ways about the actual underlying technology itself. It is almost
for certain that as the AI revolution continues, we will see immense developments in both ends
of this spectrum. For now, though, that is going to do it for today's AI breakdown.
Up next, the main AI breakdown.
Hello, AI friends. Quick note before we get back into the show, we have just opened up
registration for the March edition of the AI Education Beta Program.
The whole philosophy of this program is to get you learning by doing.
So we have short tutorials, think three minutes, five minutes, seven minutes, around specific
features and use cases in AI, followed by challenges that are step-by-step instructions that
get you actually using the most interesting and relevant tools.
We have now built out a library of more than 100 of these lessons and step-by-step companion
instructions, and we'll be dropping more each week.
For the first time, we'll also be moving beta users this month to a new dedicated platform
where you can access that library of content, build lists of lessons you want to learn from
later, and other features that we hope will help make this the single best AI learning
experience available.
If you want to check it out, go to bit.ly slash AI Beta.
That's bit.l.l.ly slash AI Beta.
registration is only open this week until next Monday, so go check it out.
A quick message before we get back to the episode today.
At this point, you guys know that Notion is one of the major tools that I use day in and day out
across the breakdown network, the AI education beta project, basically anything that I'm doing
in any sort of professional or entrepreneurial endeavor is going to be anchored by Notion.
Now, you also know that one of the big themes that I keep talking about for 2024 when it comes to
artificial intelligence is the integration of AI into our workflows. I think in many ways it's not
just about which third-party AI tool is best for any given use case, but how they actually
fit into what we're doing in ways that are actually time-saving. And that's why I love that
Notion now has AI so deeply integrated across its entire suite of tools, which means that it's
everywhere in your entire workspace. Now, for those of you who don't know, Notion combines your
notes, documents, and projects into one space that is simple and beautifully designed. It's your one
place to connect teams, tools, and knowledge, so you can do your most meaningful work. Unlike
other solutions, it doesn't have you bouncing between six different apps, it is seamlessly
integrated, infinitely flexible, and incredibly easy to use. Now, with the new fully integrated
Notion AI, you can work faster, write better, think bigger, and take care of tons of tasks
that might normally take you minutes or hours in just seconds. One of my favorite use cases is to use
notion for brainstorming. So, for example, what would a great launch strategy be for some project? Use
Notion AI to help you think through all of the different dimensions of how you could tell that story.
Now, the proof is in the pudding and Notion is used by over half of Fortune 500 companies.
And most importantly, probably for you guys, the teams that do use Notion, send less email,
cancel more meetings, save time searching for work, and reduce spending on tools.
Right now, you can try Notion for free when you go to Notion.com slash AI breakdown.
That's all lowercase letters, notion.com slash AI breakdown to try the powerful, easy-to-use
notion AI today.
And of course, when you use our link, you're supporting the show.
One more time, that's Notion.com slash AI breakdown.
On a recent video about Sora that I published,
YouTube commenter coldly analytical wrote,
I regard February 15th, 2024 as AI's Day Zero.
Sora and Gemini 1.5 both announced on the same day,
and both pushing us from the beta test phase into the AI is a real usable technology zone.
I think it's an astute comment, and quietly,
there is another leg of the next phase of AI stool that came over the weekend. It started for most
with a tweet from Matt Schumer, the CEO of Hyperite, who said, Wild Tech, you have to try.
GROQ, GROQ.com. They are serving mixtral at nearly 500 tokens a second. Answers are pretty much
instantaneous, opens up new use cases, and completely changes the U.X possibilities of existing
ones. Matt was the first to notice a live demo from what claims to be the world's fastest
LLM. When you go to grok.com, it says, we'd suggest asking about a piece of history, requesting a
guide on how to achieve your New Year resolution, or copy and pasting in some text to be translated
by prompting make it French. This alpha demo lets you experience ultra-low latency performance using
the foundational LLM, Lama 270B, created by meta-a-a-a-i running on the Groc LLPU inference
engine. Now, they actually give you two options for model. You can use either mixtral or the
Lama 270B, but suffice it to say that the speed at which generate responses has people's
mind scrambling. A little later over the weekend, Matt again, writes, the first public demo using
GROC, a lightning fast AI answers engine. It writes factual, cited answers with hundreds of words in less
than a second. More than three quarters of the time is spent searching, not generating. The LLM runs in a
fraction of a second. So what is going on? Well, GROC, on its website, on its YGROC section,
says, GROC is on a mission to set the standard for GenAI inference speed, helping real-time AI
applications come to life. In its FAQ section, GROC writes, what is the LPU inference engine?
An LPU inference engine with LPU standing for language processing unit is a new type of end-to-end processing
unit system that provides the fastest inference for computationally intensive applications
with a sequential component to them, such as AI language applications or LLMs.
On the question of why it is so much faster than GPUs for LLMs and Gen AI, GROC writes,
The LPU is designed to overcome the two LLM bottlenecks, compute density and memory bandwidth.
An LPU has greater compute capacity than a GPU and CPU in regards to LLMs.
This reduces the amount of time per word calculated, allowing
sequences of text to be generated much faster. Additionally, eliminating external memory bottlenecks
enables the LPU inference engine to deliver orders of magnitude better performance on LLMs compared to GPUs.
Jay Scrambleer on Twitter wrote a slightly more comprehensive explanation which I found useful.
Jay writes,
Grock is serving the fastest responses I've ever seen. We're talking almost 500 tokens a second.
I did some research on how they were able to do it. Turns out they developed their own hardware
that utilizes LPUs instead of GPUs. GROC developed a novel processing unit known as
the tensor streaming processor or TSP, which they categorize as a linear processor unit or LPU.
Unlike traditional GPUs that are parallel processors with hundreds of cores designed for graphics
rendering, LPUs are architected to deliver deterministic performance for AI computations.
The LPUs architecture is a departure from the SIMD, single instruction multiple data model used
by GPUs, and favor a more streamlined approach that eliminates the need for complex scheduling hardware.
This design allows every clock cycle to be utilized effectively, ensuring consistent latency and throughput.
developers, this means that performance can be precisely predicted and optimized, which is critical in
real-time AI applications. Energy efficiency is another area where LPU shine. By reducing the overhead
of managing multiple threads, and avoiding the underutilization of cores, LPUs can deliver more
computations per watt. GROC's innovative chip design allows multiple TSPs to be linked together
without the traditional bottlenecks found in GPU clusters, making them extremely scalable. This enables
linear scaling of performance as more LPUs are added, simplifying the hardware requirements
for large-scale AI models, and making it easier for developers to scale their applications.
without re-architecting their systems.
So what does this all mean?
LPUs could provide a massive improvement compared to GPUs
for serving AI applications in the future.
If anything, it will be great to have alternative high-performing hardware
since A-100s and H-100s are so in demand.
So even if all of that doesn't make it necessarily too much clearer,
what you should be taking away is that there is a new hardware approach underlying this.
This is not a different model akin to GPD4 or Gemini or anything like that.
This is a new approach to processing that lies underneath.
Carlos Perez at Intuit Machine explains,
further. He writes, GROC is a radically different kind of AI architecture. Among the new crop of
AI chip startups, GROC stands out with a radically different approach centered around its compiler
architecture for optimizing a minimalist yet high performance architecture. GROC secret sauce is this
compiler-first method that shuns complexity in favor of tailored efficiency. At the heart of GROC's
architecture is an almost surprisingly bare-bones design that does away with unnecessary logic
in favor of raw parallel throughput. The hardware itself is comparable to an ASIC, an application-specific
integrated circuit finely tuned for machine learning. However, unlike a fixed function
ASIC, GROC leverages a custom compiler that can adapt and optimize across different models.
It is this combination of a streamlined architecture and an intelligent compiler that sets GROC apart.
The key insight is that many AI-chips stack components, like GPUs, bring extraneous hardware
and bloat.
GROC returns to first principles recognizing that machine learning workloads are about
massive parallelism over simple data types and operations.
By eliminating generic hardware and even concepts like locality, the design maximizes
throughput and efficiency.
This is enabled by GROC's compiler that sits between software frameworks like
tensor flow and the hardware. The compiler analyzes and optimizes neural network graphs,
tailoring and mapping them to the underlying architecture for accelerated execution. It breaks
computations into the smallest operations to unlock parallelism. The compiler also enables
capabilities like batch size 1 inference that ensures all hardware is usefully leveraged. Critically,
Grok built its compiler before even finalizing the hardware design. The software insights
directly inform the architecture. This co-design process allowed inference-specific optimization
without legacy limitations. The innovative compiler-first methodology allows custom optimization
that balances flexibility with performance.
So basically the idea here is that whereas, for example,
Nvidia GPUs do lots of different things.
They're used to run gaming.
They work for a time used for crypto mining.
The GROC chip is totally optimized for the generative AI world.
Carlos used that phrase first principles,
and that's really what this seems like,
a design from the ground up based on this particular use case,
which is admittedly a set of different use cases.
Still, for most people, this is really just all about speed.
Dina Yerlin writes, side-by-side GROC versus Gp2 3.5,
completely different user experience, a game changer for products that require low latency.
Ethan Malik writes, quote, GBT 3.5 class LLMs are too slow.
Sure, that was true last week.
Here is GROC running Lama 2.
My favorite comment on this when I posted the same video on LinkedIn,
it is too fast, it shouldn't be this fast.
Tom Osman writes,
love seeing all the GROC demos on the feed,
but is it only good for working with LLMs?
Answer?
Nope, it's insane at other stuff too.
Watch this clip from Grock Labs that shows it running style clip on an image to create eight
different styles, in 1024 pixels in just 0.185 seconds. Basically, this is showing an image generation
capacity that is equally insanely impressively fast. Gabor Sell writes, comparing time to complete answer
for a simple code debugging question. Grock wins on speed 10x faster than Gemini, 18x faster than chat
GapT, although Gabor did say that Gemini wins on quality of answer. Some people think this is so
disruptive that we're going to see Grock get scooped up by one of the big AI labs. Faren Mather
writes prediction, Grock will get a $10 billion acquisition offer within this month. Grock or Tom Ellis responded
and said, add two zeros and we'll think about it. Just kidding, we're not for sale. But we are building
out more and more infra every day to serve customers with the lowest latency LLMs available.
Now, to the extent that there has been any critique of this or skepticism, it's been around the potential
price. Felix Red Panda writes, how does Grogh make economic sense? One of these cards cost 20k and has
0.23 gigabytes of memory. So do people buy 320 of these cards and fill two full racks to them to serve a single
Lama 70B for 10 million including servers? That can't be how this works, right? Bindu Ready from
Abacus, however, bites back, saying, I'm seeing lots of takes claiming that GROC does not make
economic sense. Barely anything makes economic sense at the beginning. LLM companies still lose billions.
Vision Pro is too expensive. Even the much-love cyber truck is too expensive. The cool thing about
GROC is the blazing fast inference. The economics will make sense in time. Now others are just
thinking about what opportunities this opens up. Responding to a question, what are some novel
use cases now possible because of this speed, AI Solopreneur Levels IO says,
So I thought about this. If you hook up a just as fast text to speech model and fast whisper speech
to text model to GROC, you can have instant conversations from human to AI and back without any delays,
like ChatGBTGBT's TTS five second delay. Arvid call responded to that. AI could respond
while you're still speaking the last syllables of your word. I think we have to rethink what this could be
used for. Honestly, I feel the humans are too slow to be even interesting for this kind of speed.
Can you imagine how fast this thing could code as an autonomous agent?
Sully Omar writes, imagine the possibilities with a model that has 1 million contexts like Gemini Pro 1.5,
instant and cheap inference like GROC, GROC, like reasoning.
We'll be building insane things.
And Andrew V responds, imagine how dated this post will be in a year and what our imagination will be dreaming of then.
Now, obviously, people are only just starting to dig into GROC and there will be a lot more to talk about in the coming days and weeks,
but you can go check it out right now at GROQ.com.
This is definitely one that is benefited by a live.
demo, so I hope you go check it out. For now, though, that is going to do it for today's AI
Breakdown. Until next time, peace.
