@HPC Podcast Archives - OrionX.net - HPC News Bytes – 20260406

Starting point is 00:00:04 Welcome to HPC Newsbytes, a weekly show about important news in the world of supercomputing, AI, quantum computing, and other advanced technologies. Hi, everyone. Welcome to HPC Newsbytes. I'm Doug Black, and with me is Shaheen Khan. Early last week, Nvidia announced it has invested $2 billion in Marvell as part of what Jensen Wong said is the company's effort to increase its AI Tam, or total available market, by making it easier for customers to use the custom AI chips that Marvell designs with Nvidia's networking gear and central processors. Put another way, some customers are opting for custom processors because they're cheaper

Starting point is 00:00:46 than Nvidia's pricey ones. This is the market factor Nvidia is tapping into. Industry observers have said price is a potential Nvidia liability or vulnerability in the long run, so this may be a response to that concern. In its story on the deal, Reuters quoted analyst Jacob Bourne at e-market saying, it broadens Nvidia's ecosystem to include more specialized silicon, which helps Nvidia remain a key access point for increasingly diverse AI workloads. And that's a key point.

Starting point is 00:01:19 The AI market is evolving and broadening, and Nvidia wants to be everywhere AI is going. Shaheen, Nvidia never seems to take its foot off the gas pedal. most other tech companies we've seen over the decades that have reached Nvidia's level of dominance haven't maintained the entrepreneurial drive we're seeing out of this company. Go big or stay home is a common mantra in Silicon Valley, and they exemplify it. Being on the leading edge of a history-defining technology certainly helps. If you already have a market valuation that is over $4 trillion, why would anyone buy your stock? if it isn't because they think it's headed to even higher values.

Starting point is 00:02:02 And doing that requires you to grow the TAM. And they are doing that in multiple directions, software, edge, robotics, investments in new clouds, sovereign AI, and proliferating their technologies through licensing and partnerships. A larger topic perhaps is why large companies invest in joint ventures or startups. And that is fundamentally to accelerate creation or meeting, of market demand, to gain access to broad range of emerging but unproven technologies, and ultimately to reduce risk balancing their own scale with external agility, so to speak. To say it more explicitly, the motivation to do so can include time to market, since investing

Starting point is 00:02:48 achieves more or less instant access to developed technologies instead of waiting for internal R&D, shared costs, where partnering with venture investors distribute financial risk and also some financial reward, but the balance would be in your favor. Talent acquisition, since startups can attract specialized, culturally aligned entrepreneurial talent that is often not available to large companies, even the ones doing very well. Market validation, because startups prove their value in real world conditions and avoid internal echo chambers. Strategic alignment as investment versus simply buying a product,

Starting point is 00:03:28 creates deeper partnerships and also gains more influence than just the client-vender relationship. Supply chain security, because as an investor, you can do more to ensure critical technologies or components remain viable and available, and financial upside, where the success of the startup yields equity returns, not just procurement benefits. We've seen this in spades in neoclops. Together, these factors can make investing a faster, lower risk and more strategically flexible approach than building everything internally. And NVIDIA has done that with Marvell, with synopsis, and a raft of others. We saw last week that AI entrepreneur and Stanford computer science professor Andrew Ng has a LinkedIn post discussing anti-AI political tactics. According to Ing,

Starting point is 00:04:20 surveyor in the UK have found that the most effective anti-AI messaging focuses on AI-enabled warfare and environmental damage, along with AI eating jobs and causing harm to children. While those messages are seen as the most effective, Ing said the anti-AIs, if you will, quote, found that saying AI will cause human extension has largely failed. Doom's were pushing this argument a couple of years ago, and fortunately our community beat it back, unquote. All this is interesting, Sheen, because we continue to hear all of those negative messages from AI Dumers. I saw one last week from opinion writer Noah Smith who said, quote, AI has the worst sales pitch I've ever seen. Our product will make you economically useless and

Starting point is 00:05:10 possibly kill you. This Smith asserted is not a value proposition. That is a really clever way of putting it, but it's hard to say AI has a marketing problem when so much money is being directed to it. But yes, there are important concerns. I mean, anytime you say a technology is going to redefine humanity, you have to expect a bunch of concerns. And we should note that Inc. owns up to at least some of those concerns. I'll just read a passage from his article. To be clear, I find AI-enabled warfare alarming. We need to continue serious efforts to monitor and mitigate the environmental impact of AI. Any job losses are tragic and hurt individuals and families.

Starting point is 00:05:55 And as a father, I hold dearly the importance of every child's welfare. Each of these topics deserve serious attention and treatment with the greatest of care, end quote. He mentions also companies that have blamed AI for lay. that they would have done anyway, or AI companies that blame competing open source AI projects for making AI unsafe. So, okay, if we all agree there are big concerns, then any difference is about what we think should be done to eliminate or at least mitigate risks. It would be good for the pro-AI community to publish their proposed plans to address those concerns. AI companies have said that they want and welcome some kind of oversight and regulations,

Starting point is 00:06:42 and it doesn't come across as reverse psychology, like a way of gaining trust that they believe they will manage it well without regulations. But as we've said here before, the global scene is more complicated and must rely on norms and not just regulations. The ideal scenario is to go as fast as possible without losing control, and there lies a fundamental complexity. With the rise of AI, we've also seen the emergence of lower precision 32-bit, 16-bit, and even 8-bit computing. Fewer bits means faster processing.

Starting point is 00:07:17 But this only works for workloads where lower-precision computing can still deliver accurate results. Just last week, we covered the new turbo-quant algorithm from Google that could compress AI inference into 3.5-bit arithmetic, T. Three-bit for all the data and one-bit for half of it as error correction, all based on the usual, very complex math. So last week, PrismML, a Caltech originated startup, introduced Bonsai, a one-bit model that they say addresses deep structural constraints on AI brought on by big models with trillions of parameters that require big data centers, more GPUs, more memory, more electrical power and more cost. Of course, big models deliver tremendous value, but PrismML is focused on uses and computing environments that aren't so big, such as phones, laptops, vehicles, robots,

Starting point is 00:08:12 and edge devices. They say Prism ML is building, quote, concentrated intelligence guided by a core belief that the next major leaps in AI will be driven by order of magnitude improvements in intelligence density, not just sheer parameter count. Their Bonsai 8B is a one-bit model designed across the entire network. It's a true one-bit model, they say, end-to-end across 8.2 billion parameters. So, Jehyn, what do you make of all this? It's pretty exciting to have these significant announcements so close together, though they've all clearly been working on these for many months. So maybe the first announcement accelerated the second. Anyway, Prism ML. introduces a new approach to AI that appears both credible and potentially very important.

Starting point is 00:09:02 The key idea is what, as you said, they call intelligence density, which is an efficiency metric rather than a pure capability metric. Conceptually, it is the ratio of how well a model performs relative to its size and resource cost. More specifically, it measures observed performance like accuracy, reasoning ability, and task success on benchmarks per unit cost, like the number of parameters, memory footprint, compute, or energy. It is not attempting to measure intelligence, but rather how efficiently a model converts resources into performance. A second key idea is standardizing on end-to-end one-bit operations, rather than training in higher precision and compressing afterwards. This makes training significantly harder and is likely where much of their

Starting point is 00:09:55 innovation would be, but avoids the losses associated with post-training quantization. To enable this, they use grouped scaling, where blocks of weights share higher precision scaling factors to preserve accuracy. Remember that quantization is a specific method of lossy compression used in AI. Then they optimize inference for the low power processing of edge devices. They reported models about 1.15 gigabytes in size versus 16 gigabytes for a 16-bit equivalent, 8 billion parameters, 2 bytes each. So that's 14x smaller. It is unsurprisingly also faster and more energy efficient than traditional 16-bit models. They report 43 tokens per second on a high-end iPhone. That's pretty significant. and 4 to 5x improvement in joules per token energy metric.

Starting point is 00:10:51 This approach differs from prior low-percision efforts such as BitNet by enforcing binary weights across all layers and then co-optimizing the full software stack. If validated, PrismML could signal a shift from bigger models to more efficient ones, with implications for both edge inference and data center scale AI. definitely want to watch. All right, that's it for this episode.

Starting point is 00:11:19 Thank you all for being with us. HPC Newsbytes is a production of OrionX. Shaheen Khan and Doug Black host the show. Every episode is posted on OrionX.net. If you like the show, please rate and review it. Thank you for listening.

@HPC Podcast Archives - OrionX.net - HPC News Bytes – 20260406

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.