Tech Brew Ride Home - (Portfolio Profile) Automated Data (Corrected)
Episode Date: November 13, 2023Our first Ride Home AI Fund Portfolio Profile Episode: Automated Data Learn more about your ad choices. Visit megaphone.fm/adchoices...
Transcript
Discussion (0)
On April 4th, 2023, around 2 in the morning, a man was found stabbed multiple times on a sidewalk in downtown San Francisco.
Hey, who did this to you?
What happened next turned the story into a political firestorm.
Reports have identified the victim as Bob Lee, the founder of Cash App.
From Bloomberg Podcasts, this is Foundering, the Killing of Bob Lee, beginning April 16.
Welcome to another weekend bonus episode of the TechMeme Ride Home, another portfolio profile episode. Actually, this will be the first portfolio profile episode to feature a company that the Ride Home AI fund has invested in, although as listeners know, both funds invest in the AI companies as well. But today, we're going to talk to automated data.
or ADI, depending on how we're going to call it in this conversation.
We're going to be talking to two founders, Michael and Jason.
Michael, first, hello to you.
Hello, hello. Thanks for having us.
Jason, or should I say JT?
Either one's fine.
I guess it might depend on whether you're angry at me or not.
Well, let's start off with sort of what ADI does.
I should say upfront that if people want to learn more as we're talking, the website is
automated dash data.io.
But whoever wants to do it, give me sort of like the two-minute elevator pitch about what
y'all are building.
Yeah, I can certainly start.
This is Mike.
And then JT loves to and does a much better job of completing my sentences.
So why don't we take that approach?
So automated data is essentially
building out a data matching platform.
Our goal is to materially automate the matching of data,
which is arguably the most complex part of a data integration task, right?
And what we like to tell people is we're trying to bring the data citizen
and empower them with the tools that enables them to match data in the same way that a very
seasoned and senior data engineer is doing or even a data scientist for that matter.
And the reason we're doing it,
we're doing this is because number one,
it's really, really hard and it's a ubiquitous problem.
If we get it right, we're going to be everywhere.
That's one reason.
But we're also doing it because we have just talked to so many firms,
so many clients that this is their biggest challenge around integration.
It's just an exciting thing for us to try to solve.
So that's the journey that we're on in terms of how we think about it.
Like what is the vision?
The vision for me is install software in data center,
pointed to the data,
hit go and the the um it just automatically creates essentially a knowledge graph of the enterprise
like that's the that's the vision obviously much harder to achieve than it is to say but but even
if you dial it back and you take a step away from knowledge graphs and and the whole AI endpoint um
we're even thinking about again the data citizen in mind we're thinking about how do we give
you know the the micrudes of the world and i don't know sqo i don't know how to program i don't even know
how to do a command line prompt login.
How do you give me the capabilities that I can connect data
across the organization and get business intelligence done?
Can I ask for a specific use case?
Let's imagine right now listening is your dream client of the moment.
What is the problem that you're going to solve?
What is the problem they have right now that they can plug automated data
into in a tangible way, not just, you know, there's a,
there's data sitting in various silos.
Like give me like a tangible real world.
We have this business and we have this problem.
I love that you start with the,
there's data sitting in a bunch of silos.
Because like I feel like that's,
that's like the starting line for like 50 different businesses.
I make the joke, but in some respects,
that is like the reality of everyone's data stack.
I think that perfect client's heart.
If I were to fabricate a perfect client, which I might argue doesn't exist,
would be to have the business that is empowered, knows how to use data and can use it.
And to your point, yes, things are fragmented.
I think we've seen time and again that people do try to shoot for the ideal master data management.
Let me put it all in one place and make it all perfect.
Then, yeah, let me, yeah, I can count on one hand how many times I've heard that work.
And I need a lot more hands and toes to tell you how many millions of dollars are costed.
But in terms of the perfect situation, yeah, I mean, there's so many, it's hard.
We're working with someone right now.
I'll just give you a real life example.
We're working with someone right now who they're conglomerate.
They have 50 operating units.
They have zero desire to integrate that data.
They have no, they don't want to.
They don't want to disrupt those businesses.
They don't want to, you know, lift and shift all their data centers to one place,
you know, so on and so forth.
So for them, there's a very, very basic need just to connect the things they already have.
Right. It's kind of interesting, like, you know, and you heard Mike do our elevator pitch,
that this is very hard.
And a lot of people try to do this.
And frankly, the space has been around for 50, 60 years.
So there's an open question, why hasn't it, why hasn't been solved yet?
But if you think about the ubiquity of something like V-lookup or join and SQL, like very basic things.
Most people, I would assume listening to this, know what a V-look-up is in Excel.
And if you don't, it's a simple way of associating two values in an Excel.
and they have to be exactly the same.
And how often do you have data that is exactly the same unless you've made it yourself?
So there's so many places.
Like I goof with my wife all the time that I wish we had this tool when we were doing our
wedding, like invite list between like both of our mothers to put that together.
Because you can see where this goes very quickly that we had a lot of overlapping things
between my list and their list and so on and so forth.
So there's a lot of trivial places.
There's so many applications to do it.
It's kind of crazy.
Well, and I'm going to, maybe I am dumb, and in a lot of ways I am,
but I'm going to, I'm going to take the dumb position here and try to ask for, like,
in simple layman's terms, this sounds like there's all sorts of info that folks have.
To what degree is it a problem of just labeling and, like, identifying.
Like, again, there's this data point, that data point, that data point.
They're either all the same or all completely different, but you have to put it in this bucket and that.
To what degree is the problem being solved here simply putting a friggin label on a thing,
a post-a-note, and saying this is that?
It's a really good point. I think a different way I could respond to your tagging, like labeling it.
There's a lot of different ways to solve this problem. One way is to clean the data, make it
not different, right?
Like that's a very simple way.
You could look at this.
Another way, yes, tagging, adding another value.
But let's go down your example a little bit.
If I were to tag every single value in,
I'm going to pick a random company,
if I were to go through Apple's client list, contact list,
right?
Imagine trying to tag all of those, right?
Imagine I want to tag all the Michael Roots out there to figure out which one is our CEO.
Right.
Unless I know more information about the Michael Roots in this database, I'm going to, it's going to be very difficult.
Right.
And, you know, a lot of this started with the census, right?
Entity resolution.
There's a lot of words for this.
There's data matching.
They say data linking.
Entity resolution is another one.
Right.
And this all started with the census, as did a lot of different, you know,
computing developments, right?
Where they were just trying to figure out, like,
okay, is this the Michael Rood that lives in New Jersey?
Is it the one that lives in New York City?
Is it the one that lives in Minnesota?
There's a sneak there that's actually the same person three times
that lived in different places at different points.
So I'm kind of goofing here, but that's the real problem, right?
So you asked about examples, right?
I could go to sales and marketing.
And a common thing people try to do is the customer 360, right, of like, how do I understand my customer and all the things they touch and ways I have, you know, interacted with them, so on and so forth.
That one's, you know, hard, depending on your level of maturity as an organization, right?
But I think, you know, and maybe topics we'll get into, because we have such wide applicability, you know, we have, you know, a lot of challenges.
talking to, you know, different companies, right?
And understanding, you know, what are your challenges?
Because Mike and I could concede, like, Brian, we could pitch you right now, right?
We could say, hey, you know, are you, how are you looking at your podcast listeners, right?
Are you trying to understand, you know, their background demographics?
Maybe you're trying to bring in some LinkedIn data to understand what places they work at.
Maybe that helps you focus, you know, the topics of your podcast and how you,
you create, invite guests, et cetera?
We just had a whole conversation about, or I'll be it, I'm pitching you,
but I just gave you a circumstance where you could be joining data
that you otherwise would not have or would have been too hard.
I'll stop there.
Yeah, well, and one of the challenges would be is I have all of these different platforms
that the data lives on, right?
And I don't necessarily want to reinvent the wheel or have to blow up all of my systems
or whatever. So I'm assuming, and I know also, because we've invested and spoken before.
But basically, the solution you've come up with allows folks to sort of ingest from their existing
platforms and then move down the chain until that it's all matched and put together in your
system in a way that is not, like I said, blowing up what works.
Yeah, yeah. I mean, like I'll all kind of walk through the same thing you just did, right?
If you have your podcast listeners, we'll keep playing this example, let's say it lives in HubSpot and maybe Snowflake and some other places, right?
You know, we could connect to those sources. HUBSOT, I'd probably want to put into a snowflake or another database, but I could connect to those sources.
Our system would read them in. We'd go through a matching box.
process and we can go more in the weeds on what that means.
And then you get the opportunity to review them.
So a big piece of this in terms of our platform, like I guess the first piece is how do I
match the stuff?
And the second piece is, you know, what is the workflow I go through to actually review
it? And maybe for your podcast listeners, you might not care.
You're like, as long as I get reasonably good enough, I can move on with my life.
So maybe you glance over it. You don't really make any changes.
But then we actually allow you to export that back out. So there's a lot of
of systems that, you know, a lot, maybe I'm too liberal with that word, but there are systems
out there that do matching, but then they make you stay in the system and they have their own
database, their own this, their own that, and then you're kind of trapped a lot of the times.
So that's kind of the difference there.
I mentioned at the start that this is the first AI portfolio company to come on the show.
So there's a two part to this because the second part I want to ask you is the, you know, you arriving at the AI moment.
But to what degree, like, I think that like one of the things that you did to get started on this was sort of using Transformers to like build out a model of what you wanted to do.
So number one, before we get to the second part about this AI moment is that is AI.
central or at least extremely important to sort of the solution that you guys have landed on?
Yeah. And there's a couple different ways. And you're kind of leading, you're leading the witness here.
So I guess number one is, yes, to your point, we have models on our platform. Our models, our platform, I should say, is very extensible.
So we've actually partnered up with a number of other people that are also developing models using AI, so on and so forth.
AI being the umbrella term. But we have some of our own things that do everything from entity recognition
through to what you said, you know, sentence transformers and doing a little bit more semantic kind
of similarity. But that's just one way we're kind of interacting with the AI moment to use that
lingo. So the other way that's even more significant is that, you know, given how hard this problem is,
Typically, the way it's solved today is you probably hire someone that is a data scientist,
data engineer, or someone with familiarity of doing this problem, understands the problem
space, can analyze data probably is technical, right?
And I'll pick on Mike since he picked on himself before.
And it says, it's not Mike, right?
Mike's the business guy.
He knows and understands what the data means, and he'll understand like, oh, I'm probably
going to care about the person's address and their name versus, you know, their phone number
and their email, right?
So he'll have that context and that domain expertise, but he won't know how to implement it.
Right.
So that's the place where we're now bringing AI even more as part of our platform to say, look, we're giving you all the tools.
But because there's such a learning curve to figure out how to use those tools, we're using AI to do it for you.
Right.
And I'm not talking chat GPT.
Like I ask it to match stuff for me because that's a UX thing, not an AI thing, although lots of cool AI behind it.
For us, it's more we've developed a way to understand what's in your data.
So really to understand that column A could be called A, but really it's the name of a company,
right?
And column B might be your email or something like that.
And we've started by developing a way to understand what those things are and that model
keeps growing.
So that's part of it.
We're also leveraging it to understand what models from a matching standpoint you should use with that data.
So because I know it's a company name and we have certain expertise, among other things,
we know what models we should be running with that type of data.
And that could be your model, right?
We could extend it to be your model or a partner's model, so on and so forth, or one of ours.
But that's kind of that first way.
And then I'll kind of circle around to the end here and say, you know,
and Mike touched upon this in the elevator pitch,
but if we can streamline how you match your data
and make it really easy,
if we can connect to everything,
the latter part that comes out of this
is we start to see the connections between your data,
which I'm saying connections and I'm trying to,
I'm error drawing a picture of a graph,
even though people can't see that.
But we're getting towards a knowledge graph, right?
And there's a lot of talk right now around Rag,
and having knowledge graphs empower, you know,
Gen A.I and pick your favorite other buzzword.
But the knowledge graph space, again, not a new technology,
but starting to get to a point where it's going to have its moment,
where as people come up the data maturity curve,
that's going to be the new thing people are going to have more.
So if we can point and shoot and generate one of those, that's pretty cool.
Well, and then, Mike, I'll direct this to you.
The second part is, you know,
one of the things that everybody's looking to do is say,
okay, let's deploy generative AI,
but can you deploy it in my domain?
Can you use what I have?
Can you use what my organization has?
And labeling it and organizing it and getting it accurate is going to be key to even being able to,
if you want to deploy a model in your enterprise and you don't have your data set up,
forget about hallucinations, it's not going to be functional, essentially.
Yeah, that's exactly right.
That's the running joke that I'm hearing at a lot of the conferences when I talk to subject matter experts.
The fact that the matter is everyone's seemingly trying to get it to the shiny ball,
but they're not putting the foundational work in to actually make that ball.
So you're right.
Whatever.
That's a sign.
No.
So you're right.
Like if you don't put the hard work up front to make sure that your data,
is of quality.
It's complete and accurate and it's connected.
You're not going to have any of the benefits of the Gen.
AI output.
We were speaking with a firm earlier this morning,
and this is exactly the conversation we had.
They said, we want to get to a place where we can bring Gen.
I.
To help our customers, to help our operational processes,
to help our business.
We're really excited about this momentum.
That's where the conversation
stopped and started, right?
We started talking about where their data is.
It's siloed. It's not connected.
They didn't have a lot of confidence in the quality of the data.
I'm like, well, look, you're going to need this foundational technology that we're building to help you get there.
Right, because...
I mean, Chris and I talk about this a lot in the fund.
It's like, look, all of the biggest models are trained on all of human knowledge, which is great,
because it trained on the Internet and things like that.
But it's sort of like the terroir in wine and like the secret sauce for it to be functional for you is not to be trained necessarily on every Reddit comment in the world.
It's to be trained on the applicable data that is actionable for your enterprise.
Exactly. Exactly.
One of the other things that we're focused on just sticking with the AI theme for a moment is we've partnered up with some of the folks on the AI side that are laser-focused on.
unstructured data, which by the way, when you think about all the data that's out there,
like there's Bloomberg data, there's refinative data, there's S&P data, like all data that's
reasonably well curated, reasonably well connected, you know, it has reliable or some of
reliable identifiers, that stuff is relatively easy to work with. But then you get into the,
you know, all the other stuff, which by the way is the vast majority of data on earth, it doesn't
have reliable, reliable identifier. It doesn't, it requires probabilistic matching matches with
they're never perfect, and a lot of it's unstructured.
So the dream, from my standpoint, is really unlocking the power of all that data using
our technology.
You're not alone on that.
I think it was Gary Marcus, or somebody had a tweet yesterday that I saw that was essentially
saying that is like, oh, we're assuming that because everything's trained on the whole
internet, that it has all of human knowledge.
But he was like, I really believe that 95% of it is either not recorded or, as you're
saying, not labeled or, you're saying, not labeled or.
or organize in a way that it's usable, even if you ingest it in any sort of way.
Let me shift gears, sort of abruptly here for a second.
But can you, to the degree that you're willing to go into it, both of you, what's your
background? Do you have a background in doing startups? I think both of you have
have Wall Street backgrounds, finance backgrounds. I'd like to know about your entrepreneurial path.
Mike?
I see J.T. waving at me, so I guess that means I get to start.
It all started at the Minneapolis Grand Exchange in 1986, which actually was an amazing place to start a career.
Exchanges with like legitimate trading pits are pretty cool places.
But yeah, I wound my way out to New York City because all my friends were getting married and having kids,
and I didn't want to do that at the time.
got to New York City, worked on Wall Street for frankly most of my career, had a ton of fun,
and had an opportunity to spin out a tech company from Goldman called Ready.
I was on the executive team there, ran that for a few years until we sold it into Thompson Reuters.
It's shortly thereafter that I joined a startup called Trux Informatics.
That's where J.T. and I met, and that's where I really got my first exposure to all of these big data challenges that we're talking.
about um had a lot of fun across uh while i was there but you know had the bug to go do something
else so left and um this is i guess startup number two it's the first time i'm a founder though
and i'm having a ton of fun it's you know there's a no assholes policy so we're working with
really good people and we're building something that we believe in and we're getting a lot of
validation from our clients all of which makes us makes this pretty exciting but uh
100 percent most of my career was spent in big companies getting frustrated now i get to
work in a small company, being frustrated for different reasons.
JT., how about you?
I like how Mike mentioned our no assholes policy that they very quickly amended to only
known asshole.
That's a good policy, actually.
If we've identified you, if we've labeled you to keep on the theme here,
it's sort of your grandfather, Dan, yeah.
Yeah, exactly.
I did not start my career in exchange.
I started my career at a company called FACSET, which is one of the big kind of data companies up against S&P and Bloomberg and Thompson, which is now profinative.
And I actually managed data as a product there.
So that was my first job out of school, and it was owning particularly fixed income content on the FACSET platform.
So understanding like how clients used it and checking for data quality and kind of making improvements, that kind of stuff.
stuff. And then I always kind of had the interest of doing quant research. My background's
always been technical, just generally, like, I've been coding for a long time just on my own.
And yeah, I wanted to keep pushing towards quant. It's math and computers and data, and you can
make money at it. Sounds great. And I'm not alone. And a lot of people I feel like think that way.
And I have changed my perspective on how to approach those things. But so I marched
towards that. I was client facing at FACSET. I did kind of, I helped people build indices and all
kinds of stuff like that. I helped other startups, right? So that was my kind of exposure coming from
like a veritable product development side to a go-to-market role and kind of helping a lot of other
businesses, doing art of the possible, that kind of stuff. And then I did jump to the buy side for a little
while, did quant research at a fund, which was if you, quant research is kind of like data science.
It means a lot of things and it kind of depends on where you are. And because I was at a small
firm, it meant everything under the sun. So I was building products and developing and doing
research and, you know, a little bit of everything. And then, yeah, from there, I, you know,
I was looking for a new job. The fund is doing so well. This is when like, you know,
passive was dominating and active as going down and this whole thing in
ETFs and blah blah blah.
And yeah, I was looking for something else.
And I like the pace of startups, at least the stereotypical notion of it.
So yeah, I stumbled upon Crux, right?
This was very early in Crux's journey before they hit their series B.
So before they landed two Segmas, their giant,
billboard client um which was which was a whole story in and of itself.
But uh yeah so I was there I was the first solutions engineer kind of laid the groundwork,
built that out, hired a ton of the people that worked there at the time and then that I got
yanked over to data engineering and built that and then I got yanked into somewhere else and
kind of floated titles for a little while and just kind of built out products and
And yeah, then after Crux, I ended up jumping over to Palantir for a little while, which
my mini tangent story is that if you work at Palantir and then you leave and you set your
status to stealth, you get a lot of inbound on LinkedIn, like a lot of inbound, which happened.
I met a friend or two that way, which was kind of cool.
But yeah, so yeah, I jumped at a Palantir to come be with Mike.
it and do ADI. So that's a story.
Let me use Poundeer to seg into a couple of sort of macro questions.
And we've been hearing about things like big data, data being the new oil and all that
stuff for more than a decade now and managing the infinite amount of data that is being generated
to make it valuable as a commodity that enterprises can act on is not new.
I'm curious now, again, to bring it back to the AI model,
if it was already a big, difficult problem to solve a giant bear to wrestle,
is it just getting even bigger now?
Because essentially the AI moment is sort of a different flavor of,
taking data and making it actionable for enterprises to use in whatever way is useful.
Yeah, I think the very simple response is yes.
I'm going to also, I don't know if you follow Scott Taylor.
Also, we're not related at all.
We made that joke to each other.
Scott Taylor has something he talks about and he talks about Master Data Management, etc.
But his take on data as the new oil is he's actually rebranded it as data as the new bullshit.
which is also an appropriate
phrasing of this.
Well, there's often a giant pile of data
and a giant pile of bullshit.
Yes, exactly, exactly.
But yeah, I mean, like,
Q, you know, generic sales pitches, right?
Data is growing over time, and it's still growing,
and we are not using any of it, and blah, blah, blah, blah, blah,
and here's an iceberg picture, right?
Like, we've all heard that story about a,
million times, who, you know, how are we chipping away at it? Right? I think that's the bigger question.
Like, I don't think anybody's going to disagree that there's more data out there. We're writing more
programs or building more things. You know, I can, I can probably get a puppy delivered to me by
my phone if I wanted to, right? Like, there's apps and things for everything and we're being
trapped all over the place. Right. So because there's all this data, great, who's using it? Right. And
Are you leveraging it?
And then...
And it's always...
And effectively, also.
Exactly.
Exactly.
And there's tons of call to actions.
Like, you know, and enter in all the consulting firms of like, out of leverage your data
and blah, blah, blah.
Right.
And...
But it is a valid question.
Like, a guy I talk with on a regular basis, Chris Tab, him and another gentleman,
Matt Housley, who co-authored the book on data engineering, right?
Fundamentals of Data Engineering, which is tremendously well-respected book.
they're writing a whole book on business value.
Just, just, you know, how do we talk about value,
especially with respect to data, right?
And it's interesting, and I keep pushing them
to the fact that it's kind of comical that nobody talks
about the value of accounting to the business, right?
We know what that is.
Like, how do we not know the value of data?
And, you know, this all kind of cycles back to maturity
and, you know, how we think about, you know,
the data we have and what we can do with it.
And, you know, that brings us to
you know, our sales cycle of art of the possible and what that looks like.
One more sort of big question.
For the last two weeks on the show, I've been talking a lot about the whole debate in AI between,
okay, maybe Open AI is going to, or whoever has the largest model is just going to win everything.
By the way, to note, this is November 7th, so this is the day after the Open AI
Developers Conference.
What are you seeing?
I want to get back to the debate between open source and like, you know, a big silo, Microsoft or sort of play in terms of the clients you're talking to you.
But what are you seeing?
Do you, are you seeing in the folks that you're talking to as your first customers, do you believe that there will be one model to rule them all?
like it'll be sort of that enterprise play where it's like, well, we go with IBM, if you were 40 years ago, or we go, or do you see that this is a marketplace where it's not just one model, it will be different sorts of flavors?
I'm biased to the different flavors. Mike, did you want to jump in?
No, I was going to, I was just going to mention that we are firmly of the opinion that the further we progress with, especially the AI technologies, we think there will be more fine-grained use cases for matching.
So here's what's going to happen.
I'm going to go back to your point about data.
You're going to have more data, more firms.
are going to want to leverage that data. They're going to leverage that data to power their
business and their operations. Much of that will be driven through some sort of AI solution.
And so to me, that means you've got an increasing number of matching use cases. And by the way,
every matching use case is its own rabbit hole, which is going to require its own reference data,
its own perhaps a bit of AI logic stuck in there somewhere to make it more efficient. So we think of
this as a majorly growing problem. And it's one of the reasons why we're
we jumped in. I will also say, and then I want to hand it back to JT, I will also say that we've been
using some of these AI solutions to help with our matching, and we do not find them to be
particularly helpful for certain types of matches, especially entity resolution and things
like that. We actually find that the approaches that we're taking, which include them,
but it's more of an ensemble approach, right? We've created this concept of pipelines where we use
different matching models, perhaps sometimes daisy chaining them, sometimes, you know, creating
new paths of logic based on if then statements, things of that nature. That gets us to a much
better matching result than in some of these AI technologies. Yeah, I agree with the points that
Mike made just broadly that, you know, more data, more models, etc. I think that there's an
interesting one that we're already seeing today, and maybe just because of
or early days, right? But one model after another comes out, right? And none of these models
are the answer to all problems, right? And sure, Skynet sounds great. So does the Hitchhiker's Guide and
all these other things that presumably could, I mean, it's funny, I used to hitchhiker's guide.
That couldn't even do everything. And you tell me everything, but I couldn't do everything, right?
But, you know, and, you know, sci-fi often drives a lot of these things forward, right?
If you look at some of the, you know, some of the reading out there, et cetera.
But yeah, I don't, I don't, it's hard to believe that there's going to be one model to rule them all, so to say.
And to what degree, you know, is, again, I'm sure you're not regular listeners of the podcast,
but from the AI conference I went to last month,
to the things we've been talking about in recent weeks,
to what degree is it a potential of there will be a thousand flowers blooming
if this is an open source movement in the sense that,
yes, I can go to Microsoft Open AI meta,
maybe Elon's thing or whatever, and I can use it as an enterprise plugin.
Or I can, I can not necessarily, it's,
doesn't necessarily have to be on-premise or anything like that, but it's fine-tuned to me
in a way that it's not just a plug-and-play. It's more... This is... This will... I am leading the witness again,
but I'm sort of getting the sense that one of the things that will almost empower clients of
yours and other people in this AI moment is that it will allow everyone to create their own
secret sauce for their own enterprises and use cases
so that you'll want to control and refine how your AI works,
because that will be so key to how it's deployable for your operations
and for your customers.
Yeah, I think there's some, there's,
there's potentially like a natural evolution here,
which you're getting at, right?
We haven't really had anything before.
That's not entirely true.
We've had a lot of stuff.
that was probably under the covers.
That wasn't as transparent.
It's not this Gen.
AI moment right now.
So yes, as we've now incorporated more data,
taught it more things, adding more information to that mix,
inevitably is going to produce better results
that are more focused on our particular outcome.
I could make the argument though that, okay,
we go through, you know, period
of time where everybody's fine-tuning theirs.
And then at some point past that,
there's such a degree of information that's available
that I don't need to fine-tune it anymore,
that I now have unlocked more information.
I think part of this, and this is a little bit of a repeat,
right, data foundations is such a big, important thing.
So the only reason that these models have been trained in fine-tune,
tuned is because not all the world's data is publicly available, right?
Shock and awe, but like, yeah, it's not.
So because it's not all publicly available, I do have to go and find a model, whether it's
public or not public, and that kind of just depends on typically interoperability and
commercials, among other things, right?
Sometimes it's performance, sometimes it's not.
But by and large, the big models these days are all kind of within, you know, earshot of one
another. But then it's like, okay, I want to use my own proprietary stuff. But conceivably,
if all of that information was out there, like if I could, I don't know, I'll pick up an arbitrary
example. If I loaded up the latest and greatest LLM with a ton of, and it's probably already
trained in this stuff, but like if I loaded it up with a ton of finance books and, you know, the
curriculum from the CFA and you know people love and hate that thing but like uh the the the
the Bloomberg GPT being exactly exactly right but it's exactly that point right like well Bloomberg loaded
it up for all of its stuff that it usually charges people an arm and a leg for right and they willingly
pay because you don't have access to that stuff so but at what point will that data become
a little bit more commoditized to your data as the new oil point right
There's growing amounts out there.
People are going to start figuring things out from other angles, right?
Like, there might come a point where you don't need to fine tune it, number one.
But, I mean, let's play this argument.
If this isn't entirely true, but if machines can only get to as smart as humans are,
or as dumb as humans are, depending on your perspective, right?
If it can only get up to us, like we ourselves are.
trained, Mike is trained on, you know, high frequency trading and electronic trading, right?
Like, he knows that space. Like, I know quant research and some other stuff, right? Like, so
I myself even possessing, like, you can only consume so much. Machines can obviously consume a
lot more, but then let's play a different argument. Like, you know, they're all based on statistics
and you have to infer context and that's stupidly hard. And we're listening to, uh,
Bill Inman, who's a well-known person in the data warehouse.
He created the data warehouse.
Like, he's literally the creator of it.
Talk about how if I said I was going to fire you, right?
Does that mean I'm going to light you on fire?
Does that mean I'm going to shoot you out of a cannon?
Does that mean I'm going to remove you from the job?
Right.
Like, who knows?
And, you know, somewhat obviously, machines,
may or may not be able to figure that out based on other data points.
But again, we're somewhat of a circular answer.
We're coming back to the fact of lack of data becomes the problem.
So we'll see how things evolve.
I don't see people pushing Bloomberg data onto the internet anytime soon.
That is going to end up in some lawsuits.
Yeah, I mean, we haven't even talked about the legality and the IP of that sort of thing.
Michael, I'll let you answer and have the final word on that.
But basically just, it's not just going to be the biggest model,
will always be the best.
It's not one model to rule them all.
Essentially, we're all betting that there's a certain amount of refinement
and a certain amount of, I keep using secret sauce,
but let's call it expertise, domain expertise.
And you only get out what you put in.
And if you just put in everything,
that's not necessarily going to win.
Take myself off mute.
Yeah, no, absolutely.
And what we like to do, because these are difficult, big topics
to take on.
And one of the challenges that we have when talking to clients
is they're scared of these daunting projects.
They're scared of these big things.
And so the way that we try to come at it is to be
be very targeted, very narrow in our approach.
Like, hey, we, you know, Mr. Underwriter, you have a problem, you know, making sure that
you understand all of the entities that you're insuring right now and that you understand
the risk that you have.
You have the 360 review of each of those.
So you know all the tertiary risks that you're potentially exposed to.
Wouldn't that be a great thing to solve?
Heck yeah.
So, you know, we come in and we tackle that very specific problem with very set of a specific
matching algorithms that are very focused and trained on private company data.
And then eventually the guy turns around and says, you know what, I also have a problem with
this other type of data that I'm dealing with.
And so for us, it's just, it's just, you know, one match at a time is one of the expressions
we like to use.
Like you're not, you're never going to get there boiling the ocean.
You've got to get there just one step at a time and one model at a time to your point.
So, you know, you have background in Wall Street.
you just mentioned underwriting there.
If I'm listening right now and I'm at, like, I don't know, hedge fund insurance company,
Fortune 500, start who's listening right now that should get in touch and get in touch with me?
I'll put you in touch with these folks.
But who should get in touch and see what automated data can do tomorrow for you?
Yeah.
Thanks for asking that question.
So J.T. said it too. Like every department within every company has this challenge. I think we're probably most, we're probably all most familiar with the challenges that marketing has, right, in terms of their customer databases and really focusing their campaigns and making sure that they're not sending two emails to the same person and, you know, those sorts of things. So that's a really obvious area. But frankly, every department in every company has these sorts of matching challenges. So we can kind of.
to start everywhere. What I would tell you is that the repeatable motion is typically around
companies, private companies in particular. Because it's sort of one of the more hairy areas of
entity resolution. You only have a few firms out there that try to tackle this, and I won't use
their names, but they don't do the best job in the world and trying to get an accurate 360-degree
view of private companies from those sources, not going to happen. So yeah, so we're
we're working with a bunch of nimble tech companies and data companies that are trying to solve this
kind of as a group. So private companies is one of the areas that we're that we're really focused on.
We're focused on the commercial real estate side for the same reason. Our head of product actually
came from commercial real estate. He understands how just awful that data environment is and talk about
unstructured data. You've got mountains and mountains. So those are the areas. So for your listeners,
that care about private companies or care about really hairy data around commercial real estate and those sorts of things, maybe even spatial use cases. That seems to be another one that's gaining traction. Those are probably areas where we could help your listeners.
Well, a reminder again, if you want to check it out on the web, it's automated dashdata.io. But as always, email me, Brian at Ryan Home Fund. I'll put you in touch.
directly. If you have an immediate use case or an immediate problem to solve, if you let us know that,
that would be the best way to pass it along. Jason, Michael, anything else that people should know
if they are excited about this, want to get involved? Are you hiring? Are there introductions
to folks that you'd love to have? Anything? Chitue, why don't you go?
I think the thing, and I like to talk about this just a lot in general, and it kind of is
my, and I would argue our sales philosophy a little bit,
rather just to talk, right?
Just because you get us on the phone
and we talk through your problems,
I'm not gonna jam our product on your throat, right?
Like there's, I've been on the receiving end,
I've been on the delivering end of these sales pitches too many times.
Like in, and you know,
kudos to salespeople out there, it is a hard job,
But my sales tact is not to, an hour sales tax generally is not to, you know, we're not the pushing salespeople.
I think we're more interested in trying to help you genuinely solve your problems.
So that's why we end up pulling in, lots of other partners, lots of other solutions.
We'll make recommendations.
Like, we're here to help you.
I guess that's my more disarming pitch out there of just, you know, we're happy.
to talk, right? There's a lot of people that are still trying to, you know, kind of where we started
to understand what's possible, what they can do, right? And sometimes you just need to talk to
somebody that knows how to do those things or can explain what's possible.
Come lay down on the therapist's couch and tell us your problems or something.
Kind of. Or meet me at the bar and I'll buy you a beer. There you go. There you go. Jason suggested
that maybe we should do these episodes at a bar.
Maybe they would, we'd get deeper and the conversations would flow a little looser.
But this was a fantastic conversation, alcohol or no.
Michael, Jason, thank you so much.
Everyone check out ADI, automated data, automated dashdata.io.
I am happy that y'all were the first from the AI fund to come on the
podcast. So thanks for, um, thanks for kicking that off. Thank you. Thank you. This is a lot of fun. Thanks,
Brian.
