No Priors: Artificial Intelligence | Technology | Startups - Listener Q&A: AI Misconceptions, The Reality of Regulation, Infinite Context, Incumbent AI Execution and Startup Ideas
Episode Date: June 8, 2023This week on No Priors, Sarah and Elad do another hangout to answer listener questions. Topics include debunking common misconceptions about AI and its implications on the world, the analogy to nuclea...r power and nuclear safety, the impact of larger context windows, developer productivity, incumbent announcements of AI products, and some requests for (fat) startups. No Priors is now on YouTube! Subscribe to the channel on YouTube and like this episode. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @SchimpfBrian Show Notes: [00:21] - What Are People Getting Wrong About AI Right Now? / New Capabilities of NLP [04:35] - Nuclear Power and Safety Concerns [11:12] - Emerging AI Companies and Research [15:54] - China's Hardware Sanctions and Funding Ramp [20:34] - Innovation in Heterogeneous Compute Infrastructure [28:08] - Enterprise Stack and Decision Making [33:44] - Data's Impact on the World
Transcript
Discussion (0)
Hello, No Pryor's listeners.
We're excited to just do another hangout episode with me in Alad and answer listener questions.
I think a fun one to start with would always be a place where we're disagreeing with the market.
So I'll ask a lot, what are people getting most wrong about AI right now?
Yeah, I guess there's two or three things that I wouldn't say they're necessarily getting wrong,
but I just feel there's some misconceptions about the first.
first one is I feel like a lot of people are kind of treating this as an extension of the last
decade of machine learning that we've seen in the sort of convolutional neural network and
R&N world.
And everybody keeps talking about it as if it's that old world and they keep emphasizing certain
aspects of data and other things which are important but not as important as they used to
be.
And in reality, we've had a technology disruption.
We've shifted to two very different architectures, diffusion-based models, which is a statistical
physics model for image gen.
And then on the language side, we move to these large language models, which some people are now calling foundation models.
And fundamentally, that's different from the prior wave of NLP in terms of capabilities, in terms of the way it works.
But also in terms of insights around things like, you know, just the fact that you now have this really interesting, like chain of logic or chain of thought style processing of information and the ability to act and synthesize information in a way that never existed before for NLP, for example.
And so I think one big misunderstanding is, oh, this is a MLN.
me doing ML for 10 years and it's the same thing and it's totally different. So I think that's
one big sort of area that I keep seeing people get things wrong, or at least, you know, there's
these these misassumptions. Second is I keep getting pinged by people saying, hey, what's working,
what's working? And there are a few things that are truly working at scale, you know, open AI and
the journey and a few other things. But the reality is it's been six months since chat GPT came out
and most people became aware of this. You know, like, you know, I think we both started
investing or being involved with the area on the genera v.S side much earlier than that, but
the sort of starting shop for the industry was six months ago, and then GPT4 came out maybe three
months ago. And so everybody's acting as if this is an old thing. And again, I think this ties
into the prior point. This is not a normal extension of what NLP used to be. Like, this is a
fundamentally new set of capabilities. And so when people are saying, well, look, no enterprises are
adopting it very much at, you're like, well, it's been six months since most people realize this
was that important, and six months is one planning cycle for a big enterprise, right?
So people are just planning what to do. So I think that's a second one.
I actually went on a walk with a growth investor that I actually have plenty of respect for,
but they asked the question of like, oh, like, what's working? What should we invest in? Do you have
anything that, you know, popped off that we should keep an eye on? I'm like, yes. And we actually
had a very similar conversation around what adoption in the enterprise looks like. And my prediction
based on the sort of like negative point of view that this investor had of, oh, like, it's not
enterprise ready.
Like, this is just a hype cycle.
And, like, we're not actually going to be investing here.
Like, we explored it.
We met 100 companies.
We're done.
I think other people will arrive at that conclusion.
And like any other wave of enthusiasm, you'll see people abandon it as well as a technology.
Yeah, I think people are just prematurely assuming it's a, it's a continuum from before.
and therefore there's nothing new here.
And I just think that's wrong.
And I think people will realize that.
I actually don't think it's going to,
I think there's going to be more hype
rather than less coming,
simply because certain things are really starting to work at scale
from a revenue perspective and really quickly.
And we've seen the first wave of viral apps
in terms of things like Lenza or other things that really ramped quickly
and then went away.
But those are signs of real traction and real usage.
And so I just feel that it's very early in the hype cycle.
There's more to come, but it's a different technology.
And I just think people don't appreciate that.
Or I should say it's an extension of some technology,
in some ways, but fundamentally the capabilities are very different, right? You're still using
deep learning, but, you know, it really implies a different modality of how these things work.
And then I think the last place, the third area that I think people are kind of getting things wrong
is the mad rush to call for regulation by people working in the industry strikes me as
very unusual and a bit naive in terms of what that actually means. And so I'm on some private,
you know, chat groups where people are talking about regulation and they're like, oh, well, the
regulators will assign a panel of experts from the industry who will then decide what we're going
to do. And I said, no, that's not how it works. You know, what happens is, you know, you have some
agency established and just like people keep talking about how it's just like nuclear. And you're like,
you do realize that once the NCR, the nuclear regulatory agency existed, we had no new nuclear
designs approved for the last 50 years due to this agency existing. It's not going to help us.
It's going to hurt the collective motion forward. And
There's so much at stake in terms of the positive things this technology can do for the world.
Like if you think about global equity around healthcare, around education, this is a single
biggest motive force to make anybody around the world irrespective of their upbringing or
background or any other aspect of diversity.
This is the biggest single thing that can impact things for the positive for healthcare
and education going forward is two examples of global equity.
And so, you know, that's one of the other areas that I kind of feel like I differ from the
pack, I actually am quite a bit of a doomer in the long run. But in the short run, I think that
regulation or misregulation can really take things down a dark path. So those would be my three.
How about you? Yeah, let's talk about, let's talk about nuclear first and we'll go back to that
because you brought it up. I think it's actually really interesting that people want to put it
in this bucket with nuclear and climate of existential risk. And I'm not saying that there isn't
a version of technical progress here that leads to existential risk, as I think you implied.
But it's funny, and the way these things relate, because as you said, like, nuclear regulation,
I think it's, like, broadly agreed upon in the scientific community that nuclear power is,
if you look at the tradeoffs of waste management, and then you look at the progression since
second, third generation reactors have been developed and the absolute, like, complete freeze on
nuclear power in the United States and in many other countries because of proliferation concerns.
Like, that's a huge impediment to one of the major contributors to actually, you know, at least
creating some progress on the climate change front. Yeah, it's proliferation and broader safety
concerns because the U.S. is still at something like 20-ish percent nuclear power. Japan is 30 percent.
France is 70 percent. So people have been using nuclear and they've been using 60-year-old reactors
this whole time without any real issues.
if you actually look at death rates or things like that.
Right, right.
What I'm pointing out is that there have been two generations of improvement in terms of
efficiency and safety of like reactor development.
And specifically and concretely because of the regulatory cost to get a new reactor license,
we're building none of something that is newer and safer.
Yeah.
And to your point on safety, the existing reactors are incredibly safe.
If you actually look at death rates from nuclear, they're incredible.
low. You know, there maybe was one person who died during Fukushima from the nuclear reactor
when, you know, the biggest earthquake and tsunami in decades hit the entire coast of
Japan and there was something like a dozen other reactors on the coast that were completely
unaffected, right? And so it's just very overblown and overstated. And I kind of worry that
in the short run, the same thing is happening on the AI side. Again, I'm a long-term doomer.
Like, I think there's real risk, but in the short run, I think we're bundling hate speech and
bias and all sorts of things with existential risk. And I've never seen the technology industry
rush into the arms and embrace of regulators which they don't understand and have never dealt
with before. And if you look at the crypto community or you look at people who've worked in
healthcare or an ad tech or other areas, they understand what it really means to be regulated
and the potential for capriciousness, right? There are some really great regulatory actions,
but there's a lot of things that tend to be arbitrary at times too. And so I just think it's a
It's a very odd moment in the history of the technology industry in terms of how people are acting.
But I guess on a more positive note, do you want to talk a little bit about some of the areas that you're excited about in terms of either big picture benefits of AI or I know you've been thinking a bit about Kodgen or other areas?
Yeah.
I think maybe adding to your point on global equity, there's one way to look at these models as encoding a huge amount of the knowledge that is available publicly on the internet most simply.
right? And the idea that you don't want to give that to as broad an audience as possible
when it is so cheap to offer like some very flawed representation of knowledge in the world
to me is ridiculous, right? And so I think it's a sort of subpoint within education and
like structured education with curriculums and the ability to train in many different fields
from science to primary education or whatever it is. But I think like I think of
very much as, like, do you want people to have, you know, information access in a way that
people think of, like, the web and search as something that you want universally accessible?
Yeah, on co-gen.
So that's one of the areas I am most excited about, not only because, like, we have seen
a product, you know, as you said, one of the few that is, like, massive, has real impact
on workflow and now generating real revenue for Microsoft and for others.
But because I think we're really early, right?
Like today, you know, you have co-pilot looking at your five open files, your five most
recent files, and doing function completion.
And, like, that already has such a massive impact on productivity.
But I think we're really early in figuring out how to deliver more context to these
models and have them work in different ways, right, and in different user experiences.
And so I think, you know, Gravely in the whole co-pilot team,
They tried a bunch of different things, including, like, everybody starts with chat and ended with this continual autocomplete, which is a great experience.
But I think there are a lot of really interesting ideas that are floating around now from, I think, the more obvious, like, search or to the very fun, like, okay, well, if you just had, like, you know, AI, Ila, the junior developer that takes any Jira ticket or shortcut or linear issue and makes a big.
best effort guess at actually writing that code with the context you give it and submits a
poll request. Like, I think people would really love that interface. And the question is just,
how can you get some of this stuff to work? And I think we're going to see a lot of progress on
that front. Yeah, it definitely seems like there's a few different companies starting to work in that
area now, too. Like, there's magic.dev. I think, you know, there's another company that Redpoint
just announced they were backing. Yeah, this is Jason Werner's company. I do think it begs this
like broader question that's become more, more interesting as the context window for the
available models has expanded, right? First, opening eye to 32K and anthropic to 75 and 100K.
And you see these somewhat exciting research headlines like, you know, two million, two million token
context window. And the reality is, first of all, it's not quite true. And second, like, you know,
I have a strong point of view here that context will expand.
to fit the window, right? Like, the amount of information or the instruction you can give to a general
intelligence is much more than 800K. And given the relationship of the context window to, like,
attention architectures and the quadratic limitation here, I think that it's going to be a
continual area of investment of figuring out how to get, like, efficient context into a model
as a product insight, right? And I'll give one, like, very concrete example.
we held a like an AI DevTools hackathon for a bunch of like college age and new grad builders at the conviction offices this past weekend.
And thankfully to with the with the support of Anthropic, we're playing around a bunch with big context and windows.
But even if you just want to feed in, for example, like Kubernetes documentation, you hit that window very, very quickly, right?
So I think there's a lot of enthusiasm from builders about what you can do with that window.
but, you know, catastrophic forgetting the idea that, like, 100K or even a million tokens is going to solve all our problems.
And, like, you just ask your questions and dump in all contexts naively is ridiculous to me.
I think it's a really important area of product and research work.
Yeah, I think in two years it'll be reasonably solved in terms of the size of context windows will be usable.
Like, I'd be happy to take that bet, but we'll see.
Like, I think the context windows are going to get very large.
I agree they're going to get very large, but do you agree that people will just fail?
it, and it will continue to be a question of how to be efficient with that.
Up to a point, yes, I guess the question is, say that you had a billion token context
window. Is that big enough? Like, where does it asymptote in terms of value? And you can think of it
in a few different ways. Like, in your case, the question is, like, how big do repos get? Or you can
ask in the context, I know some people who want to use LLMs for private data, or I should
say, you know, private data across many different users where they want to be able to differentiate
and dump all the specific user data into the context window
so they don't sort of have any cross-talk in terms of data.
And, you know, it's a data privacy approach in some sense.
And so the question is, like, how many tokens do you need to represent all the data
that's relevant to a person relative to a specific query that you're worried about
relative to PII or other sensitive information, right?
And so I agree, like, the windows will get bigger and bigger and bigger,
but eventually some of these things asymptote.
It may take years or decades to asymptote, right?
that happened with CPU on your computer or bandwidth, you know, is still probably limiting
for certain applications, although not all. So I think it's an interesting question of like,
what number is it enough? And you could probably come up with some heuristic for that, right?
Just based on the types of documents that you want to have access to and the specific
use cases that you're working against. Yeah, I guess my view would be, I don't want to take the bet
against the like billion token context window with you. But my point is more that we're, I don't
think we're close to the asymptote today where people don't worry about it. And I think even so,
even if that window gets very large, like the state of work today is that ordering matters.
And there's very little research as to like what sort of structured information. Maybe you just
described it as like, okay, as soon as your context window gets to be, you know, 32, 100K, a million
tokens, like your prompt engineering becomes a much broader field. Right. That's probably the general
optimization. Prompt engineering and then like probably a little bit different than that, but just like, how do you structure data that is put into context against these models? Yeah, exactly. Yeah, I think there's a data structure and piece of it and there's the sort of memory piece of it. And so I think both of those things will start to feed into, you know, how do you make tradeoffs between different aspects of what you feed into a model and how do you do that? So yeah, I think it's very exciting times. There's a lot of very basic things to work out. And again, I think this is back to that point of it's very early in this field. You know, and
And I think a lot of people are assuming that we have this ongoing continuum from the past
and aren't these all solved problems or why these things still open questions.
And I think the reality is just so frickin' early.
Let me ask about something that I know you've been paying attention to.
So I think like two fronts that are interesting with regards to China.
One is like what the reaction to hardware sanctions as of last October have been.
And two would be just like there's been a ramp, as you might expect, in terms of funding for Minimax and Baidu.
What's your thinking on all this?
I think it's kind of the expected shift, right?
And so if you look at the Chinese internet or software ecosystem, it's always been walled off from the U.S.
And there's been a focus on building local heroes or local incumbents.
And so that's part of what led to the rise of WeChat and a bunch of different messaging apps because the U.S.-based ones were just blocked in China.
You couldn't use Twitter and you can use Facebook and you can use all sorts of different applications.
And so that gave a lot of room for local incumbents to grow up and really become the dominant platforms in these countries.
I remember, and this may have just been a rumor, when I worked at Google, I visited the China office there briefly when they still had a China office.
And I remember rumors of the fiber to the Google Data Center getting repeatedly cut as a way to effectively take down the services, Bidu was just getting started.
So it sort of gave Bidu space to grow as an alternative search engine.
And again, that may be incorrect.
That's just kind of the rumor that was floating around back then.
And so, you know, you saw the ability to create these local heroes,
and it seems like the same thing is now happening on the LLM side.
So mini-max raised $250 million dollars at a $1.2 billion valuation.
To your point, Bidu just announced a $145 million AI venture fund.
And so it'd be surprising if the Chinese didn't invest very heavily
in creating local heroes in this very important technology.
this location. And this leads back to questions around, well, how does this translate into
different aspects of competition between different countries and regions? And how do you think
about that relative to how you think about what to regulate or not regulate in the context of the
U.S. Because fundamentally, I think these companies are going to go at this area and this technology
very hard and aggressively because it's both very important, but also it can tie into national
security or other concerns. So I think it's definitely worth watching. Yeah. And I think one of the
things that's also worth just noting in terms of the profiles of people starting these
companies. The Minimax team is former SenseTime folks, right? And for our listeners that don't
spend as much time in the, like, Chinese ecosystem, SenseTime was a, like, broadly a facial
recognition company that did a lot of work with the Chinese government, right? And so I think,
you know, I think one vein of discussion as part of the debate around,
regulation and whether or not having these models developed in the United States or in the EU
is a good thing. People have been discussing, like, well, what does AI with American values in
mind mean? I think it's probably very different than something that starts with a perspective
of like a government facial recognition company. Do you have any thoughts on the hardware
sanctions? Yeah. So right now, Nvidia is the, you know, engine for everyone's
progress. And, you know, Jensen is laughing all the way to the bank about that and much
deserves sitting on his empire. But I do think that if China cannot get A100s, H-100s, top-of-the-line
hardware, like GPUs with good interconnect to train models, like they will invest heavily
in solving that problem. And so one of the things that has been quietly happening is
investment by both the like networking and hardware players that are domestic.
like a Huawei in, you know, chips and systems to support AI training.
And this is not like a trivial pursuit, right?
You have a small set of companies that have created like TPUs at Google or other
accelerators that actually work for at-scale Transformers training.
But I think it will happen over time, right?
It's not an unsolvable problem with the right talent and enough capital.
And so there is a piece of me that feels,
that some of the regulatory attempts to control model development are misguided.
And also, they don't really take into account like what you need to go create these models
and what this sort of impacts are going to be, right?
Because what is happening?
Not that I'm against these sanctions around like AI training hardware,
but what it's doing is encouraging a domestic industry in the long run, right?
I think separately from that, there's another interesting vein that you and I have talked about where, like, you're trying to, it's probably two different technical approaches, right?
One is figuring out how to make more heterogeneous, I guess it's all sort of the same thing, but making more heterogeneous compute work for both inference and training.
And so this is companies like together or foundry or even at the sort of compiler layer, like something like Hippo.
And I think that that's interesting, depending on what layer of abstraction you're at, if that is a compiler, if it's scheduling an abstraction.
But I think that there is more innovation to be had at the infrastructure layers, given it is a constraint to the ecosystem in a way that it hasn't been, at least not in a way that's been, like, paid attention to in a very long time, right?
Yeah.
I guess another big thing that's happened over the last week or two is all the different incumbent announcements, right?
So there's all the different things around Microsoft, Google Duet, incorporation of all these things into various products in terms of AI enablement.
Do you think this impacts any startup opportunities?
I think it definitely does.
It also is to me not that different from the past, right?
Let's sort of like separate into the different components of whether or not this is going to actually work at the incumbents and whether or not that means like there's still startup opportunity around some of these platforms.
Like, there was always risk in building on other people's platforms, right?
Be that Facebook games or being part of the G Suite ecosystem or building on Shopify, right?
Like, this is not a new question for startups.
And I think the name of the game has been like, can you create sufficient value on a broad enough platform that you can break out?
Or is it a big enough independent business, right?
You look at something like Clavio and, like, you know, there's both relationship with Shopify as a platform vendor and, like, it's a multi-hundred million dollar business.
looks like there was enough there. And so I think it's like very idiosyncratic to every situation.
I think the other consideration would just be like Salesforce is an amazing company in terms of
many dimensions of execution. But it does have a repeated pattern of like announcing, you know,
interesting products that like never seem to see the light of day or get broad customer
deployment. Right. And so I think that, you know, to your point, it's a really young
industry, you're asking companies that have become larger and more difficult to drive at a
rapid pace facing technology disruption. And I think you're going to see varied levels of execution
on that. So six months in, that's like one planning cycle. That planning cycle hasn't been
executed at the biggest companies. They're saying they're going to have all of these AI features
and AI development tooling if you're Microsoft. And I think like we should see.
if any of that stuff gets shipped over the next quarter or two.
Yeah, I think it'll take 12 to 18 months for a lot of larger companies to really start to have
anything.
Do you have a point of view on this?
Yeah, I think it's overstated in terms of the impact.
I think it's expected, right?
In other words, if you don't expect Microsoft to incorporate this into every product,
then you're probably misunderstanding what Microsoft is doing.
I said that with no insight knowledge of Microsoft.
I'm just saying they've been very open on it.
They're very early to open AI.
They've been moving very aggressively.
It takes time to rule all these things out.
scale. Same with Google. They're going to start incorporating it in lots of spots. Now, sometimes
incumbents try to do something and it backfires or doesn't work very well like Google Plus.
And sometimes you do something and it works extremely well because of your distribution
and cross-sell like Microsoft Teams, right? And so I think there's always the potential for some
mis-execution or the potential for bumps along the way. But you should assume that many of the
more savvy incumbents are going to adopt this reasonably soon. And reasonably soon for an incumbent
may mean 12 to 24 months. But if they suddenly cross-sell everything to their existing,
base, then it can really hamper a startup's ability to function, right? And so as a startup,
you should kind of be asking, well, what will happen if the incumbent adds this? And do I still
have an advantage? And what if it takes three years versus one year? Does that matter? And if it matters,
then great, you have an opportunity. If it doesn't matter, then two, three years later, you can still
get really hampered, then you have to rethink what you're doing or ask about how you build
defensibility or what else you build against it. So I, you know, I think this is all expected. And I don't
think it's that surprising. And I think we'll see more and more announcements in the next 12, 24 months from
lots of big companies saying they're doing stuff. Honestly, the thing that surprised me the most was how
fast Adobe moved. Like, I didn't expect them to actually launch products this quickly. And so I was
pretty impressed by that. And Adobe, of course, is one of the better run large technology companies in
the world. And you could almost measure the rapidity with which somebody adopts this technology
almost as a metric of management competence. Right? The most competent companies, or at least the
ones closest to this technology will adopt it soonest. And the ones that are still figuring out a
lot of other stuff will take a little bit longer to figure it out, but we should assume this is all
coming, just like mobile. Like some companies took, you know, 18 months to to launch a crappy mobile
website and then three years later, five years later, everybody has apps. And 10 years later, BFA has a
great mobile app, right? So, you know, I think these things happen with time. Yeah, we would all
by Allod's ETF that is predicated on signals of actual product execution in AI amongst incumbents
rather than announcements. If you're going to put that side hustle together. Yeah. I do think that this
does create new openings to go against certain incumbents that were untouchable before. And I do think
that there is almost this room for a Rippling or Ashby-like or HubSpot-like fat startup approach, right?
So this feels to me like the first time that maybe Salesforce is vulnerable or certain ERP vendors are vulnerable or others where you have a lot of very dense interconnected software with lots of connectors and integrations to other applications as a moat, right?
And the big moat for many of these companies is, A, people know that the brand and they want to buy them and all the rest of it, and they're already running on them.
So it's hard to displace.
But secondly, is just the breadth of the product plus the breadth of integrations that you have to do if you're implementing an enterprise resource.
planning system, like a SAP or a NetSuite, right? Or if you're trying to display Salesforce,
right? And so I do think the fact that you can now build things pretty rapidly that interact
in a really rich way with each other and where you can fundamentally take advantage of sort of the
ability to munge and interact with data and create your own connectors and everything else
that I gives you, it does seem like it creates an opening for the first time for some of these really
broad-based product areas. So I think that's very exciting from a startup perspective. It actually
create some opportunities as well.
Yeah, and so just to dig into that a little bit, the idea being instead of paying for the
million man or woman hours to write painful integration code, maybe much of that can be
generated with AI today, and that's the enabler.
Yeah, it takes six months to roll out SAP.
And one of the reasons it takes six months is big enterprises will hire a consulting company
in Accenture, Deloite, whoever on the other side, to actually write a bunch of connectors
into other systems they have.
And then relatedly, there's all sorts of like proprietary views you start building on top of it and
customization.
And those are things that should be extractable and usable from a language model perspective.
So I do think you could imagine both a much faster implementation time to display something,
but also you should be able to more easily copy over all sorts of customization.
What else are you interested in that you wish people were great people were working on?
I think one other really interesting area is just the broader enterprise.
stack in terms of how can you enable enterprises to use this technology in a really rich way.
And there's startups like Chima and others who are building out different components of the
stack. But I feel like there's four or five different components that you need and you want an
integrated solution. So as you're an enterprise adopting either a proprietary solution, like you're
moving from GPT3.5 to four or using a mix of models or you're using open source models like
Lama or others, you basically need to do a lot of different things relative to that, everything from
trust and safety to other aspects of prompt management to other things, right? There's like this
whole stack of stuff. And I feel like there's a lot of point solutions in the market.
And then there's some people who are trying to build a more integrated view of this. And I think
that's a really interesting area. Yeah. I think one of the areas that I am trying to figure out
whether or not it's feasible is just the general area of decision making against like
technical enterprise datasets. Right. So like two examples.
of this could be, if you're a large organization, you have a security operation center,
right? You have analysts that process incoming information. They make decisions about how to triage
events. You have a correlation engine in front of that analyst. In theory, the SIM product does that
in practice, like nobody likes that. And it's super, honestly, super useless because of the false
negative, false positive rates on those products. And so if it's a security incident triage,
or in the DevOps space, like root cause analysis
or even like post-mortem generation is a smaller feature,
I think that's going to become really interesting,
especially since there's a lot of cross-team communication
based on different datasets.
And I feel like that's something that's well suited
for LMs to go make sense of and then explain.
How do you think about which of those things
are just going to be incumbents or startups that already exist
to adding it versus new things?
So, for example, when I look at supply chain software,
security, like what SNIC is doing or what Socket is doing, right? Socket really rushed to
add these features in a really smart way, using LLMs to classify, you know, potential malicious
code or issues in software, right? OpenSword software. And so it feels to me like a lot of these
players may just be existing players adding that AI capability versus an entirely new company.
Like which of those areas do you think will be entirely new companies?
Yeah, I think part of it is just like what are the data sets you are
operating on and like does the incumbent own those data sets or not, right? One of the interesting
things about the SIM market is like average large enterprise has, you know, 200 plus security
products, many of which pipe data into the SIM. And so a new player is not necessarily on any
weaker footing than an existing SIM vendor because they don't own the collection of that data
and they're just making sense of it, right? So I think that's challengeable. You know, we're also going to
talk to the founder of Datadog. And I think Datadog is an extremely good place to actually go attack
some of these opportunities in that they do the data collection natively, right? And they have a very
broad, sweet product. And I think, you know, for any new company coming in, figuring out whether
or not the incumbent can just turn on, like, let's say a feature blade on their existing data is one
thing. But if you're competing with incumbents that don't necessarily have any
and are taking input data the way you would be.
I think that is interesting.
That makes sense.
One other challenge to your point about, like, what an enterprise needs to do,
I do think that, like, annotation and synthetic data and data sharing,
like, continue to be, like, huge blockers for any enterprise, right?
And there's research advancements that make annotation more interesting or LHF more
interesting, and, you know, it's an unserved tooling market, if it is a market at all.
And then there have been companies that have struggled to scale except in specific
verticals around synthetic data and may end up just being like a very, I don't know,
task-specific problem, right, where you can't get general companies out of it.
But I talk to a lot of people who, you know, they have really interesting internal data sets,
these incumbents, and their ability to work on their own customers' data is quite challenged
in terms of privacy and security and the agreements they have with those customers.
Right. And so figuring out how to actually tokenize that data, not in the machine learning sense, but in the like anonymization sense or generate synthetic data you can actually use for working with LMs. I think it's like, I don't know how big of a problem it is, but it's very interesting. And like economically, is it economically valuable? I'm not sure. But I do think it's a prevalent and interesting technical problem.
Yeah, it's one also where I feel like it's been a longstanding one, just for traditional ML prior to this wave of LLMs.
And it's back to that question of, does do LLMs make it different?
Obviously, you can do synthetic data with more intelligence, but it's back to, you know, how big of a real issue is it?
And then also, you know, once you have a bit more understanding of what some of the data actually means, you know, does that impact the world?
Like when I talk to bio people, they still constantly talk about data, data, data, data, and they don't seem to understand at all the new capabilities of these technologies.
right? They're still stuck in the old ML world.
Yeah. I'll describe one last shape of company that I think is like, here's a concrete example,
but then it's quite general.
Like, you know, you mentioned like CRM and ERP, like these core systems,
core enterprise software systems, they are databases of customer and financial records.
And then the workflow in them is like update, update the record based on some action in the world.
Right. And so one of the obvious things to me,
me is to try to figure out whether or not you can take the event that's happening in the world
and write that update automatically, right, to a database in a robust way. I think that's a pretty
big ask, but like imagining translating any economic event, right? Eladco paid an invoice or transmitted
this cash balanced from one account to one of its suppliers, the ability to take that and
record the transactions from an accounting perspective and then update the financial database,
like I think that would be pretty disruptive. And I've seen different teams begin to try here.
I think it's really interesting. Yeah, that makes sense. Cool. I think that's all the questions
we have for this week. So thanks to everybody for submitting them.
