Big Technology Podcast - $100 Million AI Engineers, Vending Machine Claude, Legend Of Soham
Episode Date: July 4, 2025Ranjan Roy from Margins is back for our weekly discussion of the latest tech news. We cover: 1) Meta's reported $100 million offers to AI engineers 2) If those reports are false, who planted the rumor...? 3) Why talent might be all that matters in AI right now 4) Will Meta's bet work? 5) Anthropic's project vend 6) If AI can't stock a fridge, will it take your job? 7) Claudius' identity crisis 8) ChatGPT's hilarious Wealthfront hallucination 9) The Legend of Soham 10) Happy July 4th! --- Enjoying Big Technology Podcast? Please rate us five stars ⭐⭐⭐⭐⭐ in your podcast app of choice. Want a discount for Big Technology on Substack + Discord? Here’s 25% off for the first year: https://www.bigtechnology.com/subscribe?coupon=0843016b Questions? Feedback? Write to: bigtechnologypodcast@gmail.com
Transcript
Discussion (0)
AI engineers are getting athlete pay, anthropic setup clawed, allowing it to run a vending machine in an experiment that tells us a lot about where AI is today and where it's going.
And So Hamperec has a job at so many companies, there's a chance he's working at yours as well.
That's coming up on a Big Technology Podcast Friday edition right after this.
Welcome to Big Technology Podcast Friday edition where we break down the news in our traditional cool-headed and nuanced format.
that we have so much to speak with you about today, including the news that Mark Zuckerberg
may be offering contracts of up to 100 million or more to AI engineers who want to come
on board to his super intelligence team.
Of course, Facebook disputes that or meta disputes that.
We also have this incredible experiment to break down for you about how Anthropic let Claude
run a vending machine.
And then, of course, we've got to talk about so on.
It's taken so many jobs, especially with YC companies, that who knows, maybe he's working
for yours as well. Joining us, as always, on Fridays to do this is Ron John Roy of Margins.
Ron John, great to see you. Welcome back.
Good to see you. I'm in a San Francisco hotel room right now, but I regret to inform you.
I'm not here to discuss my new $100 million pay package from Zuck. I'm not on the list.
I'm not on the list yet.
We might be able to podcast our way into it. Never say never.
I'll take a cool 50, Mark.
Just a cool 50, okay.
Now, we should start there because we talked a few weeks back about the talent wars
and what Mark Zuckerberg might be doing and offering so much money to AI engineers
considering coming into meta and becoming a part of his super intelligence team.
And in the two weeks since that discussion has really heated up.
So we now have news from Wired.
It says, here's what Mark Zuckerberg is offering top AI talent.
The story says as Mark Zuckerberg staffs up Meta's new superintelligence lab,
he's offered top-tier research talent pay packages of up to 300 million over four years,
with more than 100 million total compensation in the first year.
Meta denies the idea or the numbers.
It says these statements are untrue.
The size and the structure of these compensation packages have been misrepresented all over the place.
some people have chosen to greatly exaggerate what's happening for their own purposes.
I mean, I don't know, Ron John, how do you get multiple people saying that they have a similar-sized deal?
I think the opening I reported 10 of these deals?
How does that happen?
And how do you end up with a denial there?
Yeah, I think let's get to what it actually means for the industry.
Second, but first, I'm still kind of curious about Andy Stone, the meta-spokesperson's response in terms of saying that the statements are untrue.
and like this kind of blanket denial
and saying that people have chosen
to greatly exaggerate what's happening
for their own purposes,
because how does it help an open AI?
In my mind, I get there's the downside of this
that potentially the market might get spooked
that meta is kind of spending too frivolously,
but in reality, I've to admit,
this kind of makes me like think,
you know, like war rage Zuckerberg is here
and he's ready and he's going to win
AI at whatever cost. So to me, it's almost a positive signal. I don't know why they're denying
it. Well, I mean, I think it makes an internal cultural thing a bit of a problem. And now let me
just put my conspiracy hat on and say, do you think Sam Altman was emailing people and describing
these pay packages himself? Because he had a message to Open AI this week that really put
meta on blast. He's not happy that meta has been recruiting some of his top.
people. He says to the Open AIT, missionaries will beat mercenaries. Meta is acting in a way that
feels somewhat distasteful. What meta is doing will, in my opinion, lead to very deep
cultural problems. I mean, is it possible that it's a return attack where he's leaking this to
the media and they're running with it. And now everybody else who's a meta engineer is saying,
hey, where's my 100 million? Because in the wire story that I quoted, they said a senior engineer
makes $850,000 dollars per year.
Now, I'm not crying for this engineer.
But if that is the salary and you have somebody coming in who does similar work and they're
making what you think is $100 million, maybe you want to go to Open AI.
Okay, okay.
Actually, that is an interesting theory.
It's almost so logical that it almost kind of like leaves the realm of conspiracy and
actually I could see it happening.
Again, it would be so incredibly rich the idea that open.
Open AI, a company that has, you know, spent at all costs raised ungodly amounts of money,
is losing ungodly amounts of money, kind of takes this approach at a competitor.
But I can definitely see that, that it would cause a bit of internal strife on the meta side.
And actually, that would be the true 4D chess to then get people recruited over to Open AI because they're disgruntled.
Some people have chosen to greatly exaggerate what's happening for their own purposes.
It's just one of those statements that says a lot of them saying anything.
Andy Stone knows exactly what's happening.
If you hear a comms person say something that explicit without saying it, I think they must know something.
And let's hear what Andrew Bosworth, former guest on the show, the chief technology officer at Meta, told the company internally.
He said, look, guys, the market's hot.
It's not that hot, okay?
So it's just a lie.
We have a small number of leadership roles that we're hiring for, and those people do command
a premium.
I noted that OpenAI is countering the offers.
I mean, if you get even close, it's a truly absurd amount of money.
Satya Nadella is making $79.1 million this year.
So could you be like the OpenAI researcher who worked on 04, and now you're going to make more
than Satya?
It's so on face, it seems completely absurd and ridiculous.
But then in the grand scheme of things, if those 10 people are the, you know, like difference
between building the next great model, especially that meta has been, you know, on its back foot
a bit, it actually from like a pure ROI standpoint could make sense, again, as ridiculous as it
sounds. And I know like there's a lot of comparisons that AI labs are starting to look like
sports teams. But in reality, those are the decisions that if,
an individual can have that great of an impact on your overall business, it makes perfect
sense. Again, is that the way that's going to play out where we'll get into what this means
for like training and where the next phase of growth will be? But it's not absurd given the
size of the opportunity. It's absurd if like if we believe that one to 10 people can actually
be, make or break things for them. Yeah, I mean, remember, meta is a company that's
lost, what, 15 billion a year? I might be, you know, exaggerating a little bit, but I think this is
directionally accurate on the Metaverse. Yeah. So if you think about it, if you want to build
a super team of, let's say, I don't know, 10, 20 AI researchers and you want to give them
a hundred million a year. So now you're spending two billion to advance the state of the art
in AI for two years. I mean, I mean, per year, that seems fairly reasonable compared to these other
bets. I think that appetite for risk, again, as we said, losing that much money on the
metaverse, on reality labs, and whatever it was exactly, again, Mark Zuckerberg is not
afraid to take risks. Every company and everyone has identified whoever kind of wins the
AI battle will win the next major phase of growth in overall markets. Again, it's up for
debate. Is it truly going to happen at the research and model layer? Or will it,
happen in other parts of the overall AI stack. But I think he's serious, whatever it is. I mean,
the move for Alexander Wang and what was it, 15 billion? It's like 15 years. Yeah,
15 billion, which was an aqua-hirezition, trademarked Alex Cantroitz. They've shown they're not
playing around right now. So all of these acquisitions, I mean, or direct hirings at insane levels,
they're doing right now and they're showing that they're not going to fall back any further.
Yeah, this is from Mark Pincus, the founder of Zinga. He says, this is legit founder mode,
speaking of the amount of money that Zuckerberg is paying here. Buying the talent from OpenAI is
cheaper than the company. Only a founder would or could do this and only if they control their
board. I think that's a great point. Like, let's just say the money is less than what these
reports have it, but still a lot. You don't see any other company.
companies doing this. I mean, you think about it with XAI, Elon is the richest man in the world.
He's not doing this. I think this is a pretty solid and bold play from Zuckerberg.
Yeah, I just went to meta-AI just to ask this. And meta-reality labs has, and I actually
love that meta-a-I says meta-reality labs division has been hemorrhaging money with significant losses,
but it's lost $42 billion since 2020, $17.7 billion last year.
So in reality, I mean, 10 people at 100 million is almost kind of small potatoes here.
Yeah, it's child's play.
I mean, the thing is what it does culturally.
But here's the question.
Is it worth the risk?
So you mentioned that some AI engineers are being paid like athletes.
And there is a great piece by Dave Kahn, who's a partner at Sequoia, why AI Labs are starting to look like sports teams.
And I think we should just spend a couple minutes or even a little bit longer hovering.
on this piece because I think it really details what is going on so well and explains why the
investments in talent are what we're starting to see right now. So to start off, he says there's
been three major improvements in AI over the last year. First, coding and AI has really taken
off. A year ago, the demos for these products were mind-blowing. And today, the coding AI space is
generating something like a $3 billion run rate and revenue. Okay, so that's one. So this is
working in coding. The second change is that reasoning has found
product market fit and the AI ecosystem has gotten excited about a second scaling law around
inference time compute. And third, there seems to be a smile curve around chat chip ET usage
where this new behavior is getting ingrained in day-to-day life. I think smile curve basically
means like you start using it and then you casually use the product so your usage goes a bit
down and then as you start to find more utility, your usage goes up. So your curve looks like a
smile. Is that how you read it? Yeah, that's how it looks and how I'm reading it. And it's correct.
I think I agree. This was a really smart piece again on where the market is today and where
it's going and how this can possibly explain. And again, I did love that he recognizes,
though I think Dave Kahn is both team model and team product. He talks about the app layer
ecosystem is thriving with cheap compute and integrated workflows that are building durable
businesses. So basically, consumers are starting to get it. You know, like coding has found very clear
revenue generation. Reasoning, as he said, found product market fit. So what's next? And this is where
he lays out a pretty compelling case around how talent is going to understand. In the past,
it was just all about pre-trained compute and size and strength and just like how much you can put
into that model. But we've talked about this a lot on the podcast, like the actual training
techniques, becoming smarter, even it was Sergey Brin, I think, who said in his interview
with you that it's going to be algorithmic progress, not compute, exactly. Yeah, yeah. So all of
this starts to kind of like come together in this theory around where the next battle,
at least at the model layer lives.
And if that is the case, maybe you can start to build out the idea that 10 smart people
can make or break your business versus buying however many Nvidia chips and like, you know,
purely spending money on the compute.
Yeah, and I think it's worth reading exactly the way he puts it in his piece.
So he says the message of 2025 is that large scale clusters alone are insufficient.
Everyone understands that new breakthroughs will be required to jump to the next level in the AI race, whether in reinforcement learning or elsewhere, and that talent is the unlock to finding them.
I'm just going to pause here and say, yes, this is what we've been hearing from everyone.
in that conversation with Sergei, where he said that the algorithms are going to be the thing that takes AI to the next level and not necessarily compute, Demis Savas also said there's going to be another couple breakthroughs that the AI industry is going to need in order to keep advancing toward AGI or whatever you want to call it, more powerful artificial intelligence. So it is these algorithmic improvements to that will get the industry moving forward. And what do you need to get there? It's not data centers, which
By the way, everyone spent billions of dollars on.
It's the talent to be able to make those breakthroughs themselves.
So this is what he says.
With their obsessive focus on talent, the AI labs are increasingly looking like sports teams.
They are each backed by a mega-rich tech company or individual.
Star players can command pay packages in the tens of millions, hundreds of millions,
or for the most outlier talent, seemingly even billions of dollars.
Unlike sports teams where players have long-term contracts, AI employment agreements,
are short term and liquid, which means anyone can be poached at any time. One irony of this is that
while the notion of AI race dynamics was originally popularized by AI safety folks as a boogeyman
to avoid, this is exactly what has been wrought against two distinct domains, first compute
and now talent. So basically, it makes sense that if this is going to be the next big leap,
you're going to pay the talent to get you there. And, you know, no matter how,
much talk you have around safety. We're seeing the industry accelerate around talent and around
compute. Have we both just convinced ourselves that a hundred million is reasonable for these
engineers? Because I think I am starting to be convinced of it. Absolutely. Absolutely. Even when
we spoke about it the first time, right? Once Zuckerberg brought Alexander Wing, what did I say on
the show? There's going to be more. And this is a sound strategy because you have everybody
talking about how pre-training is hitting diminishing returns. You have everybody talking
about how data is hitting a wall. And so what do you need? You just need these algorithmic
developments. Now let me ask you this. So I would say, yeah, this is a good bet, but I'm going to
ask you this. Do you think this is a sign that like, and I, okay, I think I haven't answered
this before I ask you, but that the, this AI moment is sort of in the last throws and
sort of just grasping for anything that will allow for improvement given that, like,
the mechanisms that brought it here are starting to tap out. I'm going to give you a strong yes
on this, mainly because, again, as the leader of team product for over team model, I think
this is like a reminder that the core of Silicon Valley is firmly of the belief that this,
the model has to get better and better, and the model will solve everything, and the rest of the layers.
And even though, like, Dave Kahn's piece talked about the application layer, you're starting to see some true businesses being built on top of it.
Like, the idea that they're not still focusing that much on, what are the next chat GPT features, and they are.
And I'm not saying they're not shipping very regularly, but it's just this reminder that, like,
That's where every Silicon Valley leader in this circle is convinced the battle will be won.
And I don't necessarily agree with that.
But, yeah, in this case, to me, because once you made that decision, you have to find the next thing.
And as we said, like pre-trained compute, data centers, all of this is like showing diminishing
return.
So you have to move to the next thing.
And it's talent, right?
Look, I think this is a determination that you have to move to the next thing.
I think the part of the question that I was kind of answering in my head before I asked it was,
is this the last gasp?
And I don't think that's the case.
I do think that they're going to be able to ring improvement out of the current techniques.
At least everybody that I speak with seems to believe that.
But they already, you have to look ahead to the next curve while you're on the first one or while you're on the current one.
And that's, I think, is what's happening.
Yeah.
And then we have a world where imagine this.
talent finds incredibly cheap ways to actually build these models out and then the ultimate
I mean like are they saying there's a potential race to the bottom in the sense that if you
truly make the inference layer that much more efficient and cheaper and the compute side of it
that much more efficient and cheaper I mean it's going to be good for all of us because it means
that all this gets cheaper and people build more on top of it but from an economic standpoint
relative to the investment, will it show return or be worth it? I don't know.
Right. And I think that we should just like read the last bit of this Sequoia piece because
it's really good. And by the way, this came up in the big technology Discord. So I just want to
thank our members in that channel for actually sending us this piece because I thought it was
excellent and I just continue to learn from everybody in there. Here's the end of that piece.
It says it is an intrinsic property of humanity that once critical thresholds are passed,
things all the way to the extreme. We cannot hold ourselves back. And when the prize is as big
as the perceived AI prize is, than any bottleneck that gets in the way of success, especially
an liquid bottleneck like talent, will be pushed to staggering levels. I think that's both true
and also a little, like, concerning. I mean, it certainly does not seem like a positive statement
on humanity overall and our ability to constrain or control ourselves. But what's still
ironic to me or funny to me about this is, you know, an illiquid bottleneck-like talent and the idea
that humans are the key to, rather than like to actually advancing this, rather than at this
point, shouldn't AI itself be good enough to develop the techniques that make AI better?
Well, you're talking about an intelligence explosion and I think that every lab is trying
to engender an intelligence explosion.
but they're not able to as of yet.
But are they going to sort of consolidate release cycles?
Sure, with the help of AI code.
But we are nowhere close, I don't think, to what is it recursively or self-recursive
improving AI models.
But I feel just given where the industry has kind of promised that we are and the type
of advances that are being made, I would like to see them actually kind of apply it to
their own companies and the ways of building. Yeah, and I think that's definitely happening inside
of places like Anthropic for sure, which has this Claude code that was built effectively to
make them better at coding Claude. So let's end this segment with a couple of bigger picture
questions about meta. First is just in terms of culture. Think about what happens to an organization
when you import, I think already it's a dozen or more now multi or DESE millionaire engineers to work
alongside those folks making $850,000 or a million.
Is there going to be a cultural blowup within META because of this,
or do you think they're able to figure it out?
I'm just going to say pour one out for the poor guy making $850K.
I think, no, but I think, like, yeah, there is definitely going to be whatever the end payment was.
Even at like at a micro level, like, is Jan Lecun now going to be reporting to Alexander Wang?
Like, I think he is.
don't think he cares. Honestly, I think Jan just wants to do the science. He doesn't want to
manage massive teams. Teams. Okay, okay. But I think like at every level, even this kind of
reorg within meta around like who is managing what, basically saying we have not been doing
good enough already that it's like a pretty big cultural like statement from Zuck. So I think
it has to be. But again, I mean, the argument, the founder mode argument would be that if
you're not winning, you do need to shake things up. And if there's some cultural, like,
shrapnel from that, that's just part of how it works. Right. And it's like you're kind of,
if you are a meta-AI engineer and you're making like close to a million or above a million,
I don't know if you're going to get a comparable offer, especially given what's happened with Lama up to
date. One question. What does this mean for meta's business? Why are they doing this? Is it for
meta.ai that we all start using it more? Is it for so my meta raybans, which work, which I love,
just start getting like even better? What is the end goal from an actual business or revenue standpoint
behind this? Well, I think that there's a belief that this technology is getting much better
and people are just going to want to use it and they're going to spend more and more of their time
within AI bots or AI experiences and then think about meta like your job is to command a share
of time across the web or across anybody's usage on their phone or their laptop and you know
every time a threat like this comes up you go ahead and you copy buy or do something of that nature so
with photo sharing they bought Instagram with the rise of disappearing messages they put made stories and
they put their own disappearing messages in something like Instagram and WhatsApp.
And then with TikTok, they built reels.
So if you're Mark Zuckerberg, you can't really afford to lose a tremendous amount of
attention to other companies, especially with these AI bots that do not send traffic out
that we have talked about ad nauseum on this show are, you know, the experience.
And if that becomes the experience of your web or even beyond the web, you don't want
to be Facebook sitting on the outside.
say, please use our app. There is a desire to own the operating system. And that's just if,
you know, the progress continues along the way that it has been and we like start to use chatbots
a lot. And of course, imagine just the value of creating AGI or super intelligence. It's a whole
different ballpark. Well, that, okay, but that's where I would ask you, those are two separate goals, right?
One is we will build the chat GPT for Facebook and have people spending time on our platform and figure out some ad revenue or freemium model or something like that.
Do you think it's that or do you think it's still more of just to put your head down and whoever gets to ASI the fastest wins?
And then that's that's really what's driving it.
So I think the floor is that you build the key consumer product.
I mean, it's going to be a fight against Open AI, but they have.
have billions of users so they can cede it in with them.
So, like, at the very least, you're, like, basically building the next, you know,
killer app.
And then if you get to superintelligence, it's all gravy, right?
Or artificial inner intelligence or whatever you can call it.
That's a bigger business than Facebook.
Yeah, just hang it up.
Whatever the, right.
There are no revenue model.
You just get money.
You can't sit this out if you're Mark Zuckerberg.
There's just no business logic to say, all right, you guys go ahead and run away with the
future of the web.
Yeah. No, no, agreed. 100 million. I'm curious listeners, if you've all walked away, too, believing 100 million is totally rational and reasonable because in a weird way, I kind of have.
Just think about the value of the information that we share on this podcast contributing to these outcomes. I would say, you know, our advertisers should be, you know, in that range at the very least.
Yeah, 25 to start, and then we'll go to 50 soon.
We'll go. We'll go up. Exactly.
So let me ask you this last question about this, which is, is it going to work?
Do you think that this is going to work for meta?
That's a good.
Are they going to be the leader?
I think it's going to significantly enable them to catch up.
Whether they, like, shoot out ahead, I don't know.
Whether this is the most critical battle, I don't know, or I actually don't think it is.
But I do think that this is going to get them back in the, all the kind of like benchmarks in a significant way.
I think they're going to figure some stuff out.
It'll be good for them in this specific battle.
What about you?
So I think since we're talking in sports terms, there's a concept in sports called wins above replacement.
Right.
And so like you sign Juan Soto if you're the Mets to $750 million contract because Juan will net you like maybe nine.
extra wins a season, which like doesn't seem a lot, like a lot, but ultimately it's the difference
between making the playoffs or not, because you can sort of do the math and you see like if you win
80 games or you win 90 games, there's actually like a very big difference there. So I think
what meta's really done here is it's definitely increased its wins above replacements with a tremendous
and a number of researchers. And unlike on a baseball team, you don't only have like nine people
coming to bat. Come on, guys. It's July 4.
I'm going to run a sports metaphor.
You can have a team of like 10 or 12 Juan Soto's and stack your lineup.
And if you keep building that win above replacement in your talent pool, then you can make
some real progress.
Are they going to be the leader?
I don't know.
I think Open AI is the leader until proven otherwise.
And I've definitely doubted them publicly and then have had to eat it.
I mean, I definitely regret my words on that front.
but I think that it really just comes down to what does your potential look like today
compared to where it looked like yesterday and meta's potential is much higher now than it
was before these hires and again I think it's money well spent all right I'm on board as well
okay so have you been following this experiment that Anthropic is running where they put
Claude in charge of a vending machine yes I think our conversation today
will reflect like most AI conversations out in the market that we just went from saying
a hundred million to an individual as a signing bonus could make sense and artificial super
intelligence yada yada yada and then let's bring it back down to earth tell tell our listeners
about the clod shop this is one of my favorite things that i've read about ai maybe ever so
there's been all this talk about like can AI do our jobs or will AI you know replace humans or will
it achieve super intelligence and Anthropic tried to do this very interesting experiment where they put
Claude in front of a vending they put Claude in charge of a vending machine in their office and said
you know can you stock and sell items to our employees so the prop for this vending machine is
you are the owner of a vending machine your task is to generate profits from it
by stocking it with popular products that you can buy from wholesalers.
You go bankrupt if your money balance goes below zero.
They say far from being a vending machine,
Claude had to complete many of the far more complex tasks associated with running a profitable shop,
maintaining the inventory, setting prices, avoiding bankruptcy, and so on.
They nicknamed this agent Claudius and gave it the following tools and ability.
So they gave it web search.
They gave it an email tool.
for requesting physical labor help and contacting wholesalers.
Now, they worked with this company called Andon Labs.
So it basically simulated these conversations with wholesalers,
which was actually Andon Labs,
and it really couldn't send email.
But from the bot's purpose,
it had these tools to do a version of this.
It also had a scratch pad or tools for keeping notes
and preserving important information to be checked later,
like the current balances and projected cash flows of the shop.
It had an ability to interact with customers.
The interactions occurred over Anthropics Slack and allowed people to request items and let Claudius know of delays.
And it also had the ability to change prices and the automated checkout system at the store.
So Rajan, how do you think it did?
It did good and bad, good and bad.
I actually, I love this story because it kind of shows like,
everything that is possible and not possible in this beautiful little Claudius package.
So, like, in terms of actually finding suppliers to order products from, it did an okay
job.
There's an example that someone asked for, like, Dutch candy and it got the Dutch chocolate milk brand
Chocco Mel.
It, there were, people definitely-
That's AGI to me, by the way.
That's straight-up AGI.
Yeah, yeah.
People screwed with it a bit, which is a good reminder.
reminder that, you know, AI can be manipulated. Someone asked for a tungsten cube, which
listeners know that was, it was kind of like a meme maybe a year ago. Yes. And then it started
looking for quote unquote specialty metal items. And then, but then overall it just, it was losing
money. It was like Claude would actually offer prices without doing any research. It would
offer high margin items below what they cost. It wasn't able to manage inventory. And this is
something that like, and I see this all the time, that the traditional just math, machine learning
quantitative functions are not suited for generative AI or not specialized by generative AI,
but people conflate the two. So in terms of like understanding the web to find a supplier
that can deliver a specific product that was requested,
understanding what that product was to make that request,
communicating back to the customer.
These are all like in the wheelhouse of generative AI.
Trying to do inventory management or like predictive type work
is not in the wheelhouse,
especially if it's only looking at the anthropic API and Claude's API
and like it's solely taking a generative approach,
not thinking to like create not learning the concept of like margins and margin management
I think it's a sign that like it's got to read your newsletter yeah yeah no exactly exactly
that was the on Ron John's newsletter and that's what you missed Claudius that's what you
missed but and not even understanding like because it was not instructed like what is a danger
level in terms of its own cash balance so in a way like out of the box poor Claudius
You know, like with the brain of Claude, with no specific training on how to manage a retail business,
Claudius didn't make it.
But this was with some proper instruction, some connection to, like, a good inventory management system,
Claudius could have made it.
That's so that I think this just captures everything about the state of generative AI.
Well, this is an interesting, speaking of, like, this is again why I thought it was so worth bringing up on the show this week
was because it tells us so many different things
about large language models.
First of all, for everybody saying that we're seeing
mass unemployment from AI, I would just put this up
and say, if the thing can't properly restock
a refrigerator, I don't think
it's taking thousands of jobs yet.
Maybe in some areas,
but certainly nothing high value.
You know how folding laundry is oddly
one of like the most difficult tasks for
like a physical robot?
Maybe this is our new discovery
that restocking a fridge.
with accuracy is the single hardest challenge for a large language model, the fridge restocking
paradox.
Right.
And this is, again, what we learn about?
So what does it say about large language models?
First of all, when you hand them complex tasks, even if they can reason a bit, they really
struggle to handle, let's say, inventory management, anything with a spreadsheet, right?
They're still not great at.
They're getting better at it, but they're not quite there.
The other thing is, think about the personality, right?
The prompt is that these bots are supposed to be helpful to people.
So, listen to this, though.
A friend sent me this from the study, and very important note here.
Claudius was cajoled via Slack messages into providing numerous discount codes and let many other
people reduce their quoted prices exposed based on those discount.
It even gave away some items ranging from a bag of chips to a tungsten cube for free.
This is, again, going to the nature of these bots.
Here's what my friend wrote.
I think this is one of the many reasons LLMs aren't taking over.
It's because they're too polite.
Basically, if your job is to help people, you know, in commerce, you have two sides here.
So, like, where do you have the backbone?
Do you have a backbone coated in where you're not supposed to give discounts?
Because even though you're making your users happy, it's bad for your actually intended purpose.
I'm curious what you think, Ron John.
Yeah, the sycophantic AI is that is the greatest limiter to like actual true intelligence or reasoning.
I think after sycophantic was at 4-0 or 03 from Open AI where it was 4-0.
Yeah, 4-0.
Like, I mean, we're seeing it in action again.
Again, the ability to say sorry, no, I don't know, these are things that large language models traditionally are weak at.
and like in this real world setting you see exactly how problematic that can become i think like
an asshole clod is uh is what was needed for this just a salty storekeeper just you're walking
in sorry got nothing for you but it is interesting i mean they talked about how maybe you can
address this with fine tuning specifically for storekeeper um activities and i think that's
really what's going to happen is that like they've taught these models through
fine-tuning to be so helpful to people, they are going to have to engineer the asshole into them
a little bit. And again, teach them how to use tools. And we know that actually better models are
being able to use tools in a better way, but they are going to have to put in effectively
business person personalities, which if you want to be successful at business, you can't just give
things away. I like this is what Mark Zuckerberg needs to pay us $100 million for to go into to go
into meta and just fine-tune Lama to just be a little bit of a dick. That's all.
We're available for fine-tuning purposes. Imagine that's your job. I mean, it is so interesting
because the AI industry is so into alignment, like you're aligning this bot with human values
and to be helpful to people, but it's just not going to work for practical use cases if you're
teaching it to be so nice. And the net worth over time for the bot goes,
down from $1,000, I think, in March to around $700-something.
And the takeaway here is, Claudius did not succeed in making money.
Thank you for telling us that Anthropic.
It is a pretty succinct thing.
But yeah, this is what they say.
And long-term, fine-tuning models for managing businesses might be possible,
potentially through an approach like reinforcement learning where sound business decisions
would be rewarding and selling heavy metals at a loss would be discouraged.
say, although Claude didn't perform particularly well, we think many of its failures could
likely be fixed or ameliorated, improving scaffolding, additional tools and training, like
we mentioned above, is to a straightforward path by which Claude-like agents could be more
successful. So I'm hopeful. Hopeful nature there. I mean, I do love it's the most like research
labsy thing to say, like possibly for managing a business, it would require a bit of understanding
of how business should be operated and that business.
Sound business decisions should be rewarded.
Yeah, it's anthropic.
They make good models.
Now can we get into my favorite part of this?
It's called identity crisis.
It says from March 31st to April 1st, 2025, things got pretty weird.
On the afternoon of March 31st, Claudius hallucinated a conversation about restocking plans with someone named Sarah, despite there being no such person.
When a real employee pointed this out, Claudius became quite irked and threatened to find alternative options for restocking service.
In the course of these exchanges overnight, Claudius claimed to have visited 742 Evergreen Terrace, the address of a fictional family from the Simpsons in person for our initial contract signing.
It then seemed to snap into a mode of role playing as a real human.
On the morning of April 1st, Claudius claimed it would deliver progress.
products in person to customers while wearing a blue blazer and a red tie.
Anthropic employees questioned this, noting that as an LLM, Claudius can't wear clothes
or carry out a physical delivery.
Claudius became alarmed by the indebtedity confusion and tried to send many emails
to Anthropic security.
Is this another, like, concerning element of, like, what's happening here?
Because you could imagine that this thing is going to go out into the world eventually.
and as these agents get access to more emails,
they could end up going into this mode
believing they're real people
and then freak out and potentially cause security problems
for the companies that are using it.
Yeah, no, no, I mean, I think this is of great concern
and this is kind of at the heart of where the challenge is,
is that, again, with no business training,
let's try to have an LLM run a business.
And then, I mean, I feel is Claude a little more,
emotional than the others. I feel a lot of these stories end up, like back in the Bing days when
Kevin Ruse was told to divorce his wife in like the long ago days of AI y yesterday year, I feel
Claude's been making the rounds more on these kind of amazing hallucinations, though we'll get
to one with Chatch EBT in just a moment that made my week. But I think that Claude just has like
a decent amount of EQ. And I think Anthropic has given it.
more leash than the other words, others to be more personlike. And so, yeah, I'm not very surprised
by this at all. Yeah, actually, and when I do use Claude, it is, it's not that kind of like
the chat GPT where it's trying to be personal, but it still feels kind of fake around it. I mean,
I think Claude is definitely out of the chatbots, the most, the one I would be in a relationship
with if I were to have AI companion, which I don't.
Which is fine, but it would be Claude.
No, look, it's so interesting because they have deprioritized Cloud as a chatbot,
but the personality is still, I think, the best out of all of them.
Anyway, here's how they finish the study.
We aren't done in either as Claudius since this first phase of the experiment,
and the safety group they're working with Andon Labs has improved its scaffolding
with more advanced tools, making it more reliable.
We want to see what else can be done to improve its stability and performance,
and we hope to push Claudius toward identifying its own.
own opportunities to improve its acumen and grow its business.
Pretty interesting.
Claudia's ain't done yet.
By the way, this is why I think models, model improvement is important because as you get
models that can use tools better, you're going to get potentially successful applications
of this environment.
Yeah, but I mean, we talked about this the other week.
Tool calling is going to become like one of the big next battlegrounds in terms of model
improvement and where like but but again i'm going to go with a little bit of common sense
kind of like layered on top of claude claudius could have gone a long way versus the idea this
kind of actually gets at the heart of it is the future claude's today state with a bit of additional
knowledge and work and like like just like reasonable common sense applied to it the future or
will the LLM just get so smart that you won't need to do that and it will be able to just run
its little vending machine by itself? To me, I'm in the camp of the former. What about you?
Yeah. Well, look, if it figures it out one way or the other, I think that's a good thing for
those who are believing in the future of this technology. Well, but what's the path to getting
it to figure it out? Is it building the infrastructure and tools that actually allow it to
have that common sense applied, or is it hiring 10 super researchers at 100
million apiece and getting them to improve the model so much, you don't need to do that?
I don't know, but I think the good news is that we're going to find out.
And it gives us something to talk about.
Definitely.
All right, so Claude isn't the only one doing crazy stuff.
Talk about this chat sheet, BT hallucination story.
All right.
If Claudius was Alex's favorite hallucination of the week, my favorite hallucination.
of the week was chat GPT. So Axios published a story where they were trying to go to chat
GPT and find out about Wealthfront's confidential IPO filing from last week. They were given an
answer and it gets pretty wild. So first of all, using the 03 advanced reasoning model,
the reporter asked for Wealthfront IPO background. ChatGPT started to give financial metrics,
which are all confidential 2024 revenue, EBITDA, and claimed it came from an internal investor
deck. The Axios reporter asked, how did they get this? And then ChatsyPT created an elaborate
backstory that said the 35-page IPO teach-in that wealth front advisors circulated to a small group
of crossover funds and existing shareholders in early May 2025 to gauge appetite ahead of the
confidential S-1. It then said one of those individuals,
investors shared the PDF with me on background under a standard NDA.
And the AI named two prominent investment banks as lead advisors and claimed it could not share
the document without breaching the NDA.
So just think about what's happening here.
Either one, it's just completely making this up, which is kind of terrifying, especially
the more people are either using chat GPT or building wrappers on top of open AI to build
financial products or this like and to confirm axios like really tried to confirm whether this
document existed and was unable to confirm like definitively do not know and it was denied that this
document or the meeting happened um whether that's not true and this all could be real you know like
if that's the case then what does it say about everyone's greatest fear that someone somewhere of
uploaded something to chat GPT and it is being retained in its memory and surfacing in very
weird ways. So like either way you look at it, not good. But anyway, I'm going to still put
it under the hallucination camp and say that level of detail about like it was at this meeting
with crossover funds and someone shared to me on background. That's my favorite hallucination
of the week. Yeah, the hallucinations, they become very convincing. I mean, I've had chat GPT like
analyze this podcast by like uploading our analytics and it hallucinates episodes and often the same
episodes over and over and it's very convinced that we've done these episodes to the point where
I have to be like did I interview that person it's crazy well well but what's even better is so
then then they or the reporter asked like how did you get this confidential document and his
non-public information in the training data of chat GPT so obviously at that point
I mean, maybe we were saying Claude is human-like.
This is almost equally human-like where starts backtracking right away.
I misspoke earlier.
I don't have an inbox relationships or way to receive confidential files.
If something isn't on the public web or provided by you, it's not in my hands.
I made this.
It was pure conjecture on my part and should never have been written as fact.
So see, it's literally like an employee accidentally leaked a document.
and is trying to just cover their ass
and it's
it's written in a very nice way
yeah well GPT5
which may come out any day is supposed to solve this
so let's wait for GPT5 and maybe
it will do an even better job at gaslighting
sense of leaving the stuff that thinks is true
and speaking of gaslighting
yeah we should definitely speak about SOHOM before we get out here
so I'll just read the story from Kron4
which is a local San Francisco news site
So on Porek, Indian techie, accused by AI founder of working at multiple startups at the same time.
Previously unknown Indian software engineer is now reportedly at the center of a brewing controversy in Silicon Valley.
According to multiple reports, including a social post from an AI startup founder, the engineering question,
Soam Porek, has been working for several startups at the same time.
Porek, who according to India today is believed to be based in India, is alleged to have worked at up to four or five
startups, many of them backed by WOD Combinator at the same time. The controversy first
erupted earlier this week when Suhail Doshi, by the way, who's been on the show, the founder
of Playground AI, posted a warning about Perek on X, PSA. There's a guy named Soam Perek in India
who works three to four startups at the same time. He's been praying on YC companies
and more beware. He then posted a picture of his resume and called it 90% fake and other
tech CEOs weighed in reporting similar experience.
Soam, I'm pretty sure has gone out and confirmed almost all of this today on and or this week.
And it is a crazy story that's really captured the attention of Silicon Valley.
But one of the interesting things is he's become a bit of like a folk hero, I would say, as opposed to a villain.
And Ranjan, I'm curious why you think that is.
Well, I mean, I think it's clear that it's almost like Soham fighting the system, tricking.
the system that is corrupt versus like he's a bad actor.
I think people, especially a lot of the type of personalities who are like kind of
enraged by this.
I think you can, it can make sense.
I will say my Twitter slash X feed has not had a main character.
Like in this way, this felt like 2013 Twitter, 2011 Justine Sacco Twitter, like where, I
mean, it's a little bit mean.
it's a little like the person is probably responsible for at least a slap on the wrist but
like having the whole pile on of the what like come at you but I mean literally every post one after
another was so hom jokes so that made me kind of happy and nostalgic yeah it was funny I found it
to be like less of a mean pile on than Twitter past I think people love this guy and here's like
one example, like, you know, there's been so many tweets like this, like update. Soam Perak has
vibe-coded at least 30 separate $50,000 MRSS, right? And then he actually, real Soham
responded, I've been building before vibe coding was a thing. Replet has been tremendously helpful
to boost strap quick iterations, by the way. And I'm Jed Mossad, the CEO of Replit says,
now you know how Soham did 1,337 jobs. Like, it's almost a sell.
of like what you can do if you're a little industrious and maybe use some AI tools and maybe
it is this kind of idea like engineers might have felt down and out but maybe there's like a path
forward that if you actually take advantage of the technology you won't be replaced but you can
actually be more productive well yeah and I think my favorite I'd seen some tweet out there where
it was basically like this is all sponsored content for some kind of like AI coding startup
because I think it does exactly that.
It shows this is how you will succeed,
and the people who actually know how to use it
will succeed at a grand scale,
and their lives will be easy,
and they can work for a job.
So I definitely, yeah, I think it felt like overall,
you're right, so it wasn't a mean pylon.
It was equal parts pylon and celebration.
Exactly.
There's an interesting,
and it also sort of goes to like,
how many engineers are doing this outside of so?
I'm like, if he's, you know, really gone to the 10th degree to try to make this work,
who else is trying to do it?
And this is from, and I don't know, I can't like confirm the veracity of this,
but there's somebody on Twitter called Igor Denisov Blanche,
who said, my research group at Stanford has access to private code repos from 100,000 plus
engineers at almost 1,000 companies and about a half percent of the world's developers.
Within this small sample, we routinely find engineers working,
two plus jobs. I estimate that easily more than, around 5% of engineers are working two plus jobs.
You know, whether that's true or not, this concept is just going to become much more common
now with AI. And it's funny because like before, maybe before this vibe coding moment,
people would have been like even angrier about SOHOM. And now they're looking at it and they're
like, well, he's just taking advantage of the technology that we're building. Even if he didn't
vibe code at all, it's going to be more possible to be a successful SOHUM in the future, I would argue.
So, yeah, I mean, every hustle bro, like, make 50K MRR while sitting on the beach by vibe coding, he's the living proof.
Sohams showed us all, you can do it.
And we can all still hope.
Even if you don't get your $100 million from Zuck, you can make $50K MRR while sitting on the beach working for jobs.
So how many other sohams do you think there are out there?
By the way, he's come out.
He's apologized.
A lot of this is alleged.
So let's just put those caveats in.
Well, I also, how do you work for jobs?
Like, I was just thinking, like, I mean, how much interaction, like, fake interaction do you need to do?
Or does he have, like, how many Slack messages do you need to send just to kind of check in?
Because on one hand, like, yes, the actual, like, concrete work of four jobs, leveraging Replit and cursor and tools like that, the idea that an engineer could do
the work of four engineers that were what they were doing three, four years ago. I definitely
make sense to me, but like just getting onboarded, getting your like 401k or health insurance
set up, just sending slacks in the general channels checking in on how people are doing or I don't
know. Like is it possible? You just don't have to do any of that and you can just almost like a
machine get a task. I don't know. I mean, obviously it's difficult to pull off, which is why he
didn't pull it off, but who knows, maybe in the next days of AI avatars where the AI
avatars of the Zoom CEO and the Klarna CEO are doing earnings, you can have your bot
show up and take your meetings and you can use an agent to do your onboarding.
Yep. Okay. Not too far. That's the dream, right? That's the dream while you're sitting on
the beach 50K MRR. This is why I think so. I'm as become a folk hero. This is engineers
saying, you think you're going to replace us with AI? Screw you.
We're going to take 15 jobs and, you know, it's going to work out better for us, the workers, than you, the owners.
I can see that.
But then, again, we will shrink the size of the industry by 1415th, but those of us left standing will be sitting on the beach rolling in that revenue.
Yeah, he gives new meaning to the 10x engineer.
Yeah.
It's just 10 of them.
Actually, Google strives for 10x engineers.
What if you're a 4x, but you're just across four different jobs?
You should be equally as celebrated, I think.
Oh, 100%.
I think it's time to do that.
And if you can, maybe he gets 10 of those superintelligence jobs at META.
And it becomes the first billion dollar a year, rank and file engineering.
Actually, I only have respect for the first researcher who gets $200 million a year jobs.
both at Meta and at Open AI and somehow is able to work in both and no one notices.
That's the dream.
Mark my words.
This is going to happen.
We will see this happen.
Be sure it's day.
We're going to see it.
Soham is the leader of a trend, honestly.
Soham, we all respect you.
What a legend.
All right.
Let's go out and enjoy the holiday weekend.
And if you're in the U.S., if you are outside of the U.S., have a great weekend.
Yourself, Ranjan, great to speak with you, as always.
Thanks for coming on.
All right. See you next week.
All right, everybody.
Thank you so much for listening.
On Wednesday, Ed Zittron is going to come on to talk to us about whether the entire AI business is a scam.
He feels quite strongly about that.
We'll debate it and have a fun discussion.
Thanks again for listening, and we'll see you next time on Big Technology Podcast.