The AI Daily Brief: Artificial Intelligence News and Analysis - Groq is 10x Faster than ChatGPT and Gemini

Starting point is 00:00:00 Today on the AI breakdown, we're talking about grok. That's grok with a Q, not grok with a K, and at nearly 500 tokens a second, it is redefining how fast LLMs can be. Before that on the brief, SoftBank is exploring a $100 billion AI chip project. The AI breakdown is a daily podcast and video about the most important news and discussions in AI. Go to Breakdown.network for more information about our YouTube, or Discord, and our newsletter. Welcome back to the AI breakdown brief, all the AI headline news you need in around five minutes. There has been a lot of scuttlebutt in the news around AI chip efforts. OpenAI's Sam Altman, of course, has recently redefined the term ambition with his interest in exploring a $5 to $7 trillion initiative to build a network of chip fabrication plants around

Starting point is 00:00:46 the world, but also with a focus on the United States. Perhaps in the realm of a little bit more realistic, if still intensely ambitious, Bloomberg reports that SoftBanks Masayoshi Sun is exploring a $100 billion AI chip project. Now, the ups and downs of SoftBank are, of course, at this point, legendary, but they are certainly currently on an upswing, which has been largely driven by their 90% share of ARM holdings, the AI chip designer, which has seen enormous stock price increases this year. Bloomberg writes that this new $100 billion chip venture would complement Arm, and is codenamed Izanagi.

Starting point is 00:01:20 Apparently, one of the scenarios that SoftBank is imagining would see them putting in about $30 billion, with another $70 billion coming from institutions in the Middle East. At the moment, SoftBank has around $41 billion in cash and cash equivalence on hand, which would mean that this would represent a major part of their liquid assets being invested into this new project. Over the last 10 trading days, as Arm has increased by more than 80% in the markets, soft bank shares have gained about 30%, and after this news broke, soft bank stock was up another 3%.

Starting point is 00:01:47 Now, at the moment, a $100 billion chip project would represent a little less than a fifth of the global semiconductor market, but I don't think anyone on the planet, especially not people who are paying attention to AI, thinks that that market is going to stay this size for very long. Now, moving on from chips into the realm of data, last year in April, Reddit started to posture like it was going to take a more harsh stance towards companies that were scraping its data to train their AI models. Now, around 10 months later, they have actually signed a $60 million annual deal with an as-yet unnamed AI company to allow that company to train on Reddit content. One thing we don't know is whether the deal is exclusive, but the contact who gave Bloomberg the news speculated that more

Starting point is 00:02:26 likely was that the contract would serve as a model for future agreements with other AI companies as well. Now, from a Reddit standpoint, this is something they wanted to get done before a potential IPO, which could happen as soon as next month. And obviously from the larger AI industry standpoint, any of these types of agreements right now, as legal battles around copyright and fair use are being fought in the courts, are going to be influential in shaping how the industry moves next when it comes to proprietary sources of data. Next, we have a little bit of follow-up around SORA. If you've been listening to my shows recently, you will know that Sora has been the major topic of conversation over the last few days, really ever since OpenAI announced it,

Starting point is 00:03:01 but some people aren't quite as impressed. Jan LeCoon, who is of course the head of META's AI department, said that if Open AI's goal is really to simulate the world, that Sora's approach is ill-suited for that. He said, modeling the world for action by generating pixels is wasteful and doomed to failure. Writes the Decoder, there has been a historic debate about the merits of generative versus discriminative classification methods, with generative methods considered more difficult and less effective. Lekoon believes that generative models for sensory inputs will fail because it is too difficult to deal with the prediction uncertainty of high-dimensional continuous sensory inputs. Basically, he's saying that while generative models work for text, because

Starting point is 00:03:36 there are a finite number of symbols, uncertainty is easier dealt with in that context. Sensory input, on the other hand, just is a huge additional level of complexity. It will perhaps surprise you not at all to know that Lacoon and Metta have their own approach to this problem, which they're calling the video join-embedding predictive architecture or VJPA. Again from the decoder, the model predicts complex interactions and interprets them by adding hidden parts of video to convey the dynamics of objects and interactions to the AI. VJPA focuses on predictions in a broader conceptual space similar to human cognitive image processing.

Starting point is 00:04:06 This architecture allows VJPA to adapt to different tasks by adding a small task-specific layer rather than retraining the entire model. Elon Musk also had some words for OpenAI Sora, saying on Twitter, where Tesla video generation exceeds OpenAI, is that it predicts extremely accurate physics. That is essential for self-driving. So again, what we have here is another critique, not of Sora's ability necessarily to make incredible-looking videos, but instead how accurate it truly is as a representation of the real world and its ability to be a world simulator, which seems, of course, to be the ultimate

Starting point is 00:04:37 goal in OpenAI's pursuit of AGI. Now, staying on the theme of Elon or at least electric vehicles for just a minute, a Chinese EV maker X-Peng has announced that it would hire 4,000 people and invest millions in AI in a strategy that is contrasting with other Chinese and global EV makers who are, instead of investing more, currently racing to cut their own costs. Microsoft continues its streak of investing in European countries, announcing an AI infrastructure bid in Spain coming along with a $2.1 billion investment. This of course follows our recent announcement of a $3.45 billion, AI-focused investment in Germany, and will be centered around AI and cloud infrastructure. Lastly, today, one that I'm only just starting to see talked about on

Starting point is 00:05:15 Twitter a little bit. The University of Pennsylvania's Penn Engineering Today wrote a blog post at the end of last week called New Chip opens door to AI computing at light speed. The piece begins, Penn Engineers have developed a new chip that uses light waves rather than electricity to perform the complex math essential to training AI. The chip has the potential to radically accelerate the processing speed of computers while also reducing their energy consumption. They call this a silicon photonic or SIPH chip, and it's based on recent research around manipulating materials at the nanoscale to perform mathematical computations using light. And so we closed this brief, basically where we began, with the continued focus on AI chips. In many ways, these two bookending stories represent the spectrum

Starting point is 00:05:54 of what we're seeing right now. On the one hand, people trying to solve the compute access problem by throwing money at it and just building out more infrastructure, versus on the other hand, thinking in fundamentally new ways about the actual underlying technology itself. It is almost for certain that as the AI revolution continues, we will see immense developments in both ends of this spectrum. For now, though, that is going to do it for today's AI breakdown. Up next, the main AI breakdown. Hello, AI friends. Quick note before we get back into the show, we have just opened up registration for the March edition of the AI Education Beta Program.

Starting point is 00:06:27 The whole philosophy of this program is to get you learning by doing. So we have short tutorials, think three minutes, five minutes, seven minutes, around specific features and use cases in AI, followed by challenges that are step-by-step instructions that get you actually using the most interesting and relevant tools. We have now built out a library of more than 100 of these lessons and step-by-step companion instructions, and we'll be dropping more each week. For the first time, we'll also be moving beta users this month to a new dedicated platform where you can access that library of content, build lists of lessons you want to learn from

Starting point is 00:06:59 later, and other features that we hope will help make this the single best AI learning experience available. If you want to check it out, go to bit.ly slash AI Beta. That's bit.l.l.ly slash AI Beta. registration is only open this week until next Monday, so go check it out. A quick message before we get back to the episode today. At this point, you guys know that Notion is one of the major tools that I use day in and day out across the breakdown network, the AI education beta project, basically anything that I'm doing

Starting point is 00:07:30 in any sort of professional or entrepreneurial endeavor is going to be anchored by Notion. Now, you also know that one of the big themes that I keep talking about for 2024 when it comes to artificial intelligence is the integration of AI into our workflows. I think in many ways it's not just about which third-party AI tool is best for any given use case, but how they actually fit into what we're doing in ways that are actually time-saving. And that's why I love that Notion now has AI so deeply integrated across its entire suite of tools, which means that it's everywhere in your entire workspace. Now, for those of you who don't know, Notion combines your notes, documents, and projects into one space that is simple and beautifully designed. It's your one

Starting point is 00:08:08 place to connect teams, tools, and knowledge, so you can do your most meaningful work. Unlike other solutions, it doesn't have you bouncing between six different apps, it is seamlessly integrated, infinitely flexible, and incredibly easy to use. Now, with the new fully integrated Notion AI, you can work faster, write better, think bigger, and take care of tons of tasks that might normally take you minutes or hours in just seconds. One of my favorite use cases is to use notion for brainstorming. So, for example, what would a great launch strategy be for some project? Use Notion AI to help you think through all of the different dimensions of how you could tell that story. Now, the proof is in the pudding and Notion is used by over half of Fortune 500 companies.

Starting point is 00:08:46 And most importantly, probably for you guys, the teams that do use Notion, send less email, cancel more meetings, save time searching for work, and reduce spending on tools. Right now, you can try Notion for free when you go to Notion.com slash AI breakdown. That's all lowercase letters, notion.com slash AI breakdown to try the powerful, easy-to-use notion AI today. And of course, when you use our link, you're supporting the show. One more time, that's Notion.com slash AI breakdown. On a recent video about Sora that I published,

Starting point is 00:09:17 YouTube commenter coldly analytical wrote, I regard February 15th, 2024 as AI's Day Zero. Sora and Gemini 1.5 both announced on the same day, and both pushing us from the beta test phase into the AI is a real usable technology zone. I think it's an astute comment, and quietly, there is another leg of the next phase of AI stool that came over the weekend. It started for most with a tweet from Matt Schumer, the CEO of Hyperite, who said, Wild Tech, you have to try. GROQ, GROQ.com. They are serving mixtral at nearly 500 tokens a second. Answers are pretty much

Starting point is 00:09:52 instantaneous, opens up new use cases, and completely changes the U.X possibilities of existing ones. Matt was the first to notice a live demo from what claims to be the world's fastest LLM. When you go to grok.com, it says, we'd suggest asking about a piece of history, requesting a guide on how to achieve your New Year resolution, or copy and pasting in some text to be translated by prompting make it French. This alpha demo lets you experience ultra-low latency performance using the foundational LLM, Lama 270B, created by meta-a-a-a-i running on the Groc LLPU inference engine. Now, they actually give you two options for model. You can use either mixtral or the Lama 270B, but suffice it to say that the speed at which generate responses has people's

Starting point is 00:10:31 mind scrambling. A little later over the weekend, Matt again, writes, the first public demo using GROC, a lightning fast AI answers engine. It writes factual, cited answers with hundreds of words in less than a second. More than three quarters of the time is spent searching, not generating. The LLM runs in a fraction of a second. So what is going on? Well, GROC, on its website, on its YGROC section, says, GROC is on a mission to set the standard for GenAI inference speed, helping real-time AI applications come to life. In its FAQ section, GROC writes, what is the LPU inference engine? An LPU inference engine with LPU standing for language processing unit is a new type of end-to-end processing unit system that provides the fastest inference for computationally intensive applications

Starting point is 00:11:10 with a sequential component to them, such as AI language applications or LLMs. On the question of why it is so much faster than GPUs for LLMs and Gen AI, GROC writes, The LPU is designed to overcome the two LLM bottlenecks, compute density and memory bandwidth. An LPU has greater compute capacity than a GPU and CPU in regards to LLMs. This reduces the amount of time per word calculated, allowing sequences of text to be generated much faster. Additionally, eliminating external memory bottlenecks enables the LPU inference engine to deliver orders of magnitude better performance on LLMs compared to GPUs. Jay Scrambleer on Twitter wrote a slightly more comprehensive explanation which I found useful.

Starting point is 00:11:46 Jay writes, Grock is serving the fastest responses I've ever seen. We're talking almost 500 tokens a second. I did some research on how they were able to do it. Turns out they developed their own hardware that utilizes LPUs instead of GPUs. GROC developed a novel processing unit known as the tensor streaming processor or TSP, which they categorize as a linear processor unit or LPU. Unlike traditional GPUs that are parallel processors with hundreds of cores designed for graphics rendering, LPUs are architected to deliver deterministic performance for AI computations. The LPUs architecture is a departure from the SIMD, single instruction multiple data model used

Starting point is 00:12:19 by GPUs, and favor a more streamlined approach that eliminates the need for complex scheduling hardware. This design allows every clock cycle to be utilized effectively, ensuring consistent latency and throughput. developers, this means that performance can be precisely predicted and optimized, which is critical in real-time AI applications. Energy efficiency is another area where LPU shine. By reducing the overhead of managing multiple threads, and avoiding the underutilization of cores, LPUs can deliver more computations per watt. GROC's innovative chip design allows multiple TSPs to be linked together without the traditional bottlenecks found in GPU clusters, making them extremely scalable. This enables linear scaling of performance as more LPUs are added, simplifying the hardware requirements

Starting point is 00:12:55 for large-scale AI models, and making it easier for developers to scale their applications. without re-architecting their systems. So what does this all mean? LPUs could provide a massive improvement compared to GPUs for serving AI applications in the future. If anything, it will be great to have alternative high-performing hardware since A-100s and H-100s are so in demand. So even if all of that doesn't make it necessarily too much clearer,

Starting point is 00:13:15 what you should be taking away is that there is a new hardware approach underlying this. This is not a different model akin to GPD4 or Gemini or anything like that. This is a new approach to processing that lies underneath. Carlos Perez at Intuit Machine explains, further. He writes, GROC is a radically different kind of AI architecture. Among the new crop of AI chip startups, GROC stands out with a radically different approach centered around its compiler architecture for optimizing a minimalist yet high performance architecture. GROC secret sauce is this compiler-first method that shuns complexity in favor of tailored efficiency. At the heart of GROC's

Starting point is 00:13:46 architecture is an almost surprisingly bare-bones design that does away with unnecessary logic in favor of raw parallel throughput. The hardware itself is comparable to an ASIC, an application-specific integrated circuit finely tuned for machine learning. However, unlike a fixed function ASIC, GROC leverages a custom compiler that can adapt and optimize across different models. It is this combination of a streamlined architecture and an intelligent compiler that sets GROC apart. The key insight is that many AI-chips stack components, like GPUs, bring extraneous hardware and bloat. GROC returns to first principles recognizing that machine learning workloads are about

Starting point is 00:14:17 massive parallelism over simple data types and operations. By eliminating generic hardware and even concepts like locality, the design maximizes throughput and efficiency. This is enabled by GROC's compiler that sits between software frameworks like tensor flow and the hardware. The compiler analyzes and optimizes neural network graphs, tailoring and mapping them to the underlying architecture for accelerated execution. It breaks computations into the smallest operations to unlock parallelism. The compiler also enables capabilities like batch size 1 inference that ensures all hardware is usefully leveraged. Critically,

Starting point is 00:14:45 Grok built its compiler before even finalizing the hardware design. The software insights directly inform the architecture. This co-design process allowed inference-specific optimization without legacy limitations. The innovative compiler-first methodology allows custom optimization that balances flexibility with performance. So basically the idea here is that whereas, for example, Nvidia GPUs do lots of different things. They're used to run gaming. They work for a time used for crypto mining.

Starting point is 00:15:09 The GROC chip is totally optimized for the generative AI world. Carlos used that phrase first principles, and that's really what this seems like, a design from the ground up based on this particular use case, which is admittedly a set of different use cases. Still, for most people, this is really just all about speed. Dina Yerlin writes, side-by-side GROC versus Gp2 3.5, completely different user experience, a game changer for products that require low latency.

Starting point is 00:15:31 Ethan Malik writes, quote, GBT 3.5 class LLMs are too slow. Sure, that was true last week. Here is GROC running Lama 2. My favorite comment on this when I posted the same video on LinkedIn, it is too fast, it shouldn't be this fast. Tom Osman writes, love seeing all the GROC demos on the feed, but is it only good for working with LLMs?

Starting point is 00:15:49 Answer? Nope, it's insane at other stuff too. Watch this clip from Grock Labs that shows it running style clip on an image to create eight different styles, in 1024 pixels in just 0.185 seconds. Basically, this is showing an image generation capacity that is equally insanely impressively fast. Gabor Sell writes, comparing time to complete answer for a simple code debugging question. Grock wins on speed 10x faster than Gemini, 18x faster than chat GapT, although Gabor did say that Gemini wins on quality of answer. Some people think this is so disruptive that we're going to see Grock get scooped up by one of the big AI labs. Faren Mather

Starting point is 00:16:23 writes prediction, Grock will get a $10 billion acquisition offer within this month. Grock or Tom Ellis responded and said, add two zeros and we'll think about it. Just kidding, we're not for sale. But we are building out more and more infra every day to serve customers with the lowest latency LLMs available. Now, to the extent that there has been any critique of this or skepticism, it's been around the potential price. Felix Red Panda writes, how does Grogh make economic sense? One of these cards cost 20k and has 0.23 gigabytes of memory. So do people buy 320 of these cards and fill two full racks to them to serve a single Lama 70B for 10 million including servers? That can't be how this works, right? Bindu Ready from Abacus, however, bites back, saying, I'm seeing lots of takes claiming that GROC does not make

Starting point is 00:17:02 economic sense. Barely anything makes economic sense at the beginning. LLM companies still lose billions. Vision Pro is too expensive. Even the much-love cyber truck is too expensive. The cool thing about GROC is the blazing fast inference. The economics will make sense in time. Now others are just thinking about what opportunities this opens up. Responding to a question, what are some novel use cases now possible because of this speed, AI Solopreneur Levels IO says, So I thought about this. If you hook up a just as fast text to speech model and fast whisper speech to text model to GROC, you can have instant conversations from human to AI and back without any delays, like ChatGBTGBT's TTS five second delay. Arvid call responded to that. AI could respond

Starting point is 00:17:40 while you're still speaking the last syllables of your word. I think we have to rethink what this could be used for. Honestly, I feel the humans are too slow to be even interesting for this kind of speed. Can you imagine how fast this thing could code as an autonomous agent? Sully Omar writes, imagine the possibilities with a model that has 1 million contexts like Gemini Pro 1.5, instant and cheap inference like GROC, GROC, like reasoning. We'll be building insane things. And Andrew V responds, imagine how dated this post will be in a year and what our imagination will be dreaming of then. Now, obviously, people are only just starting to dig into GROC and there will be a lot more to talk about in the coming days and weeks,

Starting point is 00:18:14 but you can go check it out right now at GROQ.com. This is definitely one that is benefited by a live. demo, so I hope you go check it out. For now, though, that is going to do it for today's AI Breakdown. Until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - Groq is 10x Faster than ChatGPT and Gemini

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.