Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 2x11: Using AI to Assess Unstructured Data with Concentric
Episode Date: March 16, 2021Most organizations have a vast amount of so-called unstructured data, and this poses a major risk for operations. But what if there was an AI-powered application that could sift through all this data,... categorize it, and determine the risk profile for everything? That’s the promise of Concentric IO, and the premise for this episode of Utilizing AI with their CEO, Karthik Krishnan. The company uses a deep learning model trained on a vast pool of data from the Internet to create “Concentric Mind” which can identify documents across many business verticals, and this is continually tuned based on the results at each new customer environment. It also includes a language model to identify clusters of documents thematically. Guests and Hosts: Karthik Krishnan is CEO of Concentric. Connect with Karthik on Twitter at @KK_Karthik. Chris Grundemann a Gigaom Analyst and VP of Client Success at Myriad360. Connect with Chris on ChrisGrundemann.com on Twitter at @ChrisGrundemann. Stephen Foskett, Publisher of Gestalt IT and Organizer of Tech Field Day. Find Stephen’s writing at GestaltIT.com and on Twitter at @SFoskett. Date: 3/16/2021 Tags: @SFoskett, @ChrisGrundemann, @KK_Karthik, @IncConcentric
Transcript
Discussion (0)
Welcome to Utilizing AI,
the podcast about enterprise applications for machine learning,
deep learning, and other artificial intelligence topics.
Each episode brings in experts in
enterprise infrastructure to discuss applications of AI in today's data center.
Today, we're discussing a very practical application of AI,
and that is assessing and understanding
risk of unstructured data.
First, let's meet our guest, Karthik Krishnan of Concentric.
Thank you, Stephen.
Good morning.
My name is Karthik Krishnan.
I am the founder and CEO of Concentric AI.
You can find me on Twitter at KK underscore Karthik.
Thank you very much for having me.
I'm Chris Grundemann.
In addition to being the co-host today,
I am also an independent consultant,
content creator, coach, and mentor.
You can learn more at chrisgrundemann.com.
And this is Stephen Foskett.
I'm the organizer of Tech Field Day
and the publisher of Gestalt IT,
also the host of Utilizing AI every single week.
You can find me on Twitter at S Foskett.
So, Kartik, to kick things off here, my background is in enterprise storage.
I'm kind of Mr. Storage. I love the storage.
But one of the challenges of enterprise storage is basically what we call unstructured data, which is a great euphemism
because unstructured data really kind of means what it sounds. It basically means big piles of
files. Most companies have either giant file servers full of stuff, or nowadays they have
Dropbox or box.com or whatever. And basically it is kind of a mess. And in my background in storage,
that's always been a challenge. One of the cool things about AI is that AI has this capability
to kind of search through vast amounts of data and apply heuristics to figure out what the data is.
That's essentially what Concentric is doing, isn't it?
That's absolutely right, Stephen. Just to back up a little bit, I think it's important to kind of understand what unstructured data is and what the differences
are and consequently the complexity from it. Unstructured data is really by definition
any data that doesn't have a predefined data model. So compared to your
relational databases where you have schemas and you put data where you know exactly what you're
looking for, as you correctly pointed out, unstructured data tends to be documents,
tends to be files, tends to be information that users create, modify, duplicate, provide access permissions to people on a daily basis.
And it typically tends to have two challenges.
One, just in the fact that because it doesn't have a predefined data model,
it's pretty much the Wild West.
You have huge smart gas, multiple tens of terabytes, sometimes hundreds of terabytes of data that pretty much has its own,
every file, every data element has its own schema. It has its own information.
The import and the meaning underneath the data itself tends to often dictate how critical or
business sensitive it is. And the other aspect is compared to relational databases, which tend to be tightly controlled with very strict API access, unstructured data tends to be in the hands of end users and your employees.
And they're creating, modifying, duplicating, sharing content, both inside as well as outside the company.
And that tends to add a completely different risk layer compared to structured data.
That's super interesting, actually.
And I'm definitely the storage noob on the call here.
And so to me, when I first heard about unstructured data, that's not what I immediately jumped
to is this idea of files, right?
Because from a user perspective, I spent a lot of time structuring that data, so to speak,
writing these documents, putting these Excel files together, printing out PDFs.
These are contracts. This is intellectual property. There's tons of information here and labeled. And then,
of course, from a security perspective, we all know that this data should be classified in some
way. We should understand what's sensitive, what's not, what has personally identified
information and what doesn't. But that classification is really hard for most folks and
most times, right? I think that even just knowing what the classification scheme should be, but then
applying it, especially if you already have a bunch of documents out there. So, you know, I'm
immediately seeing the benefits that potentially putting AI into this could have, which is,
I'm assuming is automatically classifying this data, understanding what the risk level is involved.
How do you go about doing that?
Yeah, so that's absolutely right.
That's exactly the problem that we set out to solve at Concentric, which was to help enterprises discover all their business critical content, identify risk to it and protect it. And the specific sets of challenges that we have set out to solve in the realm of unstructured data is this notion that if you think about a file,
in the world of unstructured data, the complexity in a file comes from the import and the meaning
underneath the data itself. So if you think about a human way, we would look at a document and say,
yeah, this is a
business critical document. This is a contract, or this is an M&A document, or this is a financial
document. Now, humans possibly cannot go through hundreds of terabytes of data and do this sort of
manual sifting and saying, okay, these are all my contracts. These are my M&A documents. And so
traditionally, the way people have done it is using
rules and regular expressions. So you write a rule that says, hey, I'm going to look for the
word contract in a contract document. Now, it's very quickly apparent that the challenges with
natural language processing and unstructured data is that sort of an approach tends to be very
limiting because of very simple challenges that natural language
processing, that natural language presents, which is, I'll give you two simple examples.
One is polysemy.
Polysemy is the same word can mean completely different things depending upon the context
within which it's used.
For example, if you use the word architecture, architecture can reference a next generation
software, can be referenced in the context of a next-generation software design document,
or it could be the architecture of a building.
So if you search for the word architecture,
you're going to pull up every document
without the context within which the word is actually used.
The other aspect of this is called synonymy,
which is the same word,
like if you use the word outstanding or excellent,
they mean the same things. And so they may be used in the same context, but Like if you use the word outstanding or excellent, they mean the same things.
And so they may be used in the same context,
but unless you searched for the word outstanding
and excellent, you're not gonna get all of the information
that you really, really care about.
And so understanding the context of a file or a document
is super critical to understanding the meaning
of what it is you're looking for.
And so that's what we do where we bring deep learning to this problem, where we use deep
learning as a form of natural language processing to really understand the context within which
the words are used within a document to help understand the broader and important meaning
of it, to help enterprises understand where all of their business critical information
might be from business confidential
data to financial data to intellectual property to even privacy data like customer data and so on.
And so the idea there is to go beyond words and regular expressions to really understanding the
meaning of a document without the customer having to a priority know what it is that they're looking
for or define a lot of these complex rules that they have to end up writing.
And this is a really a long-term challenge for enterprise storage. This is something that has
been plaguing us literally since as long as there has been, you know, shared storage, you know,
servers and so on. I mean, you know, the majority of data in many modern
enterprises is what we would consider to be unstructured data. And for the most part,
even though companies, I got to give them credit, companies have made a valiant effort to try to
encourage people to use standards and use classification and tag things and so on. It just hasn't happened.
And so most companies, I think probably maybe even all companies have just a massive pool of
storage that they're not sure what it is. And as Karthik was saying, right now, the major way that
people deal with that is basically they have tools that go through and try to extract
something. And then they'll just do kind of interactive iterative searches against that
and say, you know, okay, find me everything that has this in it. Okay, now find me everything that
has this in it. Okay, now try to do this. Now try to do that. And there's a whole industry of
software that does this. And that's why when I talked to Concentric originally, I was, you know, it really turned on a light bulb in my head, because the ability of a computer system to
go through and search through this stuff. I mean, just imagine if a robot could go through your
basement, organize everything and figure out what was useful. I mean, that would be tremendous,
wouldn't it? I'd buy that robot. I'd buy that robot.
You guys are working on robots that do basements too, right, Karthik?
Is that the next product?
Not quite.
But yeah, you're absolutely right, Stephen.
I think that the challenge has traditionally been this conundrum, which is, ideally, the
IT teams are responsible for the security and managing the risk to your important data.
And it would be good to do it centrally.
But the challenge with this is, in our analysis, we have found 90 plus thematic categories of data that enterprises have to worry about.
That it's just impossible for enterprises to write all the possible rule combinations that are needed to essentially be able to corral all of this information.
So what they end up doing is you end up then saying, okay, I can't do it centrally. So I'm
going to rely on my end users to self-identify what's important, not important. End users have
a day job. Security is not at the forefront as they are going about their jobs. And we were
deployed in a hedge fund, about 200 people, they're doing classification.
And the CISO told me, look, even at about 200 employees, I can't rely on my end users
to identify and make sure that all of this information is classified correctly.
And so that's the conundrum, which is, how can you provide solutions that allow enterprises
to do it centrally without having to rely on your end users and yet give IT administrators, security teams,
the tools to be able to do it centrally
without them having to go write all of these complex rules
and regular expressions to essentially be able
to inventory all of the data.
And so that's what we have set out to do.
And that's what we do,
which is really giving enterprise teams the tools to be
able to centrally do all of these things without having to rely on your end users. But discovery
is really only one part of it. I mean, the second part that is super important is then quickly
identifying the risk to it, right? So for example, one of the things that we do, which we're pretty
proud of is we do peer document comparison. So I can take a contract and I can identify all the
derivatives of that particular contract. And I can look at how all of them have been shared inside
the company or outside the company. And I can say, hey, here's this one document that has been shared
outside the company where all of its peer documents have not been. And I can surface that in the form
of a risk index. And so very quickly, you've gone from just giving customers an inventory of all of
their data to really helping them where it matters, which is identifying risk to their data,
and then essentially being able to help remediate. So you're helping significantly lower the odds of
data loss, which I mean, ultimately, is the business that we are in.
Yeah. So I wonder if we can dive a little bit under the hood here, which I mean, ultimately is the business that we are in. Yeah. So I wonder if we can dive a little bit under the hood here, because I mean,
I definitely see the value is very interesting. But as you talk about this, right, I think we
all agree on how complex of a challenge this is and the reason why humans don't do a very good
job out of it. But the other part is the reason that humans do do a good job when they're looking
at a single document is that context. And you said, I think you said there's 90 different
variants you've found and it's all about this context. And I wonder, I mean, how do you,
how do you actually train a machine or a programmer and application to understand human context? I
mean, what, what's actually, what are the nuts and bolts of making this work? Right. So the,
the, the essence of what we do is this idea that words are not the ultimate atom of meaning within a
document because words have to be placed within the context of a sentence a paragraph and really
understanding the structural associations of how the words are used within a document is really
what gives you the broader import and the meaning of the document itself.
And so what we do is we analyze a document. So we go through,
we go through the files to really try to understand,
use language models to understand the associations within a document to then
essentially be able to derive a mathematical representation of the document to say, okay,
this is what the essence of this particular document is. And we use those mathematical
representations to then create thematic groupings of data. So completely unsupervised in our system,
you'll see a cluster grouping of NDAs, a cluster grouping of contracts, a cluster grouping of
M&A documents, just to pick on a few categories,
that comes from a deep understanding of the data
within that particular file or document itself.
And then once we've done that,
we use more than about 400 data models
to essentially be able to also give you
a thematic view into that data.
So tell you, these are contracts, these are NDAs,
these are trading documents and so on.
So, and that comes from deploying language models at the data to help essentially categorize
and mine for risk within the data itself.
And that's where deep learning is super useful because deep learning gives you the ability
to do this at scale and also give you a very rich representation of a file or a document
to essentially be able to capture the essence and the meaning of that document itself.
And so it's based on essentially training with a big volume of data, is that right? So
where does the training come from? What is the training material?
Yeah, so the language models that we have trained on,
I mean, initially we've trained on the wild, right? So you use language models that have been trained
on the World Wide Web,
which is the greatest repository that there is
of just language itself.
But what we do that's unique on top of that
is we use what's called concentric
mind, which acts as a centralized data intelligence service that as we are building these models up,
and we're seeing sort of unique categories of data within customer environments,
we're essentially aggregating that information at mind, inside of mind, so that as we deploy,
and as we get the N plus one customer benefits
from all of the learnings from the prior N customers. And so that's sort of how,
when we go into a customer environment, usually the models have already been trained just based
off of our training in the wild. And then if we see something customer specific, we're essentially
training within a customer environment and essentially feeding that into mind., we're essentially training within a customer environment and
essentially feeding that into mind. So we're building up a virtuous loop as we're aggregating
more and more customers. Yeah, that's a really interesting aspect. So essentially you initially
just trained it on basically documents generally, but now as each customer uses it, they're helping the training as well to kind of build up this mind, as you call it, to better understand the real world of unstructured data in the enterprise.
Does that mean that the system is more effective in certain verticals or certain business segments?
Or does that mean that it works pretty generally everywhere?
I mean, I don't know if every business has the same kind of files.
Yeah. So it's usually a mixed model. What happens is it works pretty well across the board. Like,
for example, we're about to go into a POC at a manufacturing company. And in their case,
they, there's a lot of sort of the more business critical aspects
like financial data, business critical data, and so on,
where it'll work just as well
as within like a financial services company.
Where there's sort of unique elements
is if they have some very specific documents
related to intellectual property
that we may not have seen.
For example, you can think of a healthcare company that may have documents related to intellectual property that we may not have seen. For example, you can think of a healthcare company that may have documents related to research around a specific drug.
The thematic cluster build-out of saying, okay, these are all about 200 or 500 documents that
are all talking about a pretty similar topic, that can actually work without any sort of
supervision because that's actually using a language model to figure out, okay, thematically,
they're all talking about the same thing.
It's just the labeling to be able to say,
these are research documents,
which is where some specific information
that we get within a particular customer site
can be useful to then be able to train those models.
So the answer to your question is,
it generally works pretty well across the board
without any training,
but then where we see some very specific, customer-specific information, that's what we essentially train on.
We help feed mine so that it builds it up going forward. Yeah, I've seen the same thing, Stephen.
That's a really interesting question because, I mean, for sure, there's legal documents that I
can't read or even understand what they mean. And same thing with like patent applications often look very, very strange and they're not
quite English. What about other languages? I'm guessing this has to be fairly language,
like actual like spoken language specific, right? So English versus Spanish versus French versus
Mandarin is going to be a very different language model, I'm guessing. Actually, the beauty of this is it's not.
It's actually, well, today we focused mostly on English and it's text-based just by virtue of,
it's more of a go-to-market decision. But just to sort of, you know, in geek speak,
if you look at the models themselves, all the models are doing is they're taking words,
they're looking at the structural associations of the words to build up a mathematical representation
of the document and comparing these documents in what we call a latent semantic space to say,
these are all semantically similar documents. So if you look at the base concept, the concept works across any language as long as there is sort of,
you know, the language is based off of grammar. Does it make sense? Like meaning
you could think of, you know, French and Spanish and English as all having a grammatical construct.
So within a grammatical construct, it should all work the same. Now, where it may be a little tricky is if you have pictographic languages, right? Like,
for example, Chinese, that's not a grammatical construct. And so there, the language models
will have to learn. But the models are actually pretty agnostic when it comes to grammatical
representations, because the model doesn't care that it's English or Spanish.
It's really trying to understand the associations of how the words come, you know, what's before a particular word, what's after a word to develop a mathematical representation.
Yeah, that's machine learning models, often they can work with different types of data sets, as long as those data sets are consistent
with what it's been trained on. So, I mean, imagine a self-driving car driving in the snow,
or at night, or during the day, or, you know, in a green area, or in a gray desert kind of area,
you know, the machine learning model might see those
things as conceptually similar and similar patterns. And I imagine that's the same thing
that's going on here. So it's looking through the documents and it's seeing clusters of symbols,
because of course it doesn't really understand anything. It's just looking for patterns of
symbols in the documents and it sees similar clusters. And it doesn't matter if
it's written in German or English or Portuguese, it's going to be able to identify those. That's
a really interesting aspect and a really powerful use of machine learning as opposed to, as you
mentioned, regular expressions or some other previous technology. That's absolutely right. In fact, the analogy is like a stop sign in
different languages, right? You can kind of see it when you see it just by virtue of, okay,
you know the sign, you know the associations, it's exactly the same. And that's the beauty of it.
That's why these models can actually work extremely well. I mean, the reason that AI is
very, very powerful in this construct is twofold. One, just the scale at which they can operate and
basically going through hundreds of terabytes of data and being able to organize that within a
timeframe that you possibly couldn't throw. You wouldn't have the people, you wouldn't have
the time to essentially be able to
do that. Secondly, the flexibility in terms of being able to adapt that across different languages
and different settings, just because they're not wedded. Like it's not like humans where
they have to understand German or you have to understand French. I mean, they're essentially
using associations to figure that out. But even as a person, I could recognize this as a contract,
even though it's in German or in French. And I think that's kind of what you're saying. to figure that out. But even as a person, I could recognize this as a contract,
even though it's in German or in French.
And I think that's kind of what you're saying.
That's exactly right.
That's exactly right.
It's the pattern matching, right?
I mean, humans have, you know, evolutionarily,
we are very tuned to essentially being able to pattern match.
And the pattern matching comes from the fact that
before we learn, you know, the system one learning comes through just figuring out patterns.
And that's exactly what that's exactly what the language models do.
It's always interesting to see how artificial intelligence and specifically machine learning, which tasks of a human they can take over, which ones they can't, which ones are easy and which ones are hard for machines.
And this is one of those things where intuitively I wouldn't have guessed this was an area the machine learning would be able to make such good advances just because that context seems like such a human thing.
But I'm still, I'm curious. So once you've found these semantic groupings, right, and you've
grouped these documents together, how do you get from understanding that these are similar documents
to understanding or assessing risk? Yeah, that's a great question. And so I'll give that in the context of a simple example, right? Let's say there's a customer who's got a classification program where they're relying on end users to essentially classify data. And end users are going in and, okay, let's say they're creating a contract, a super sensitive contract. contract and the user decides to mark that document as suitable for public consumption
just because they're careless or for nefarious reasons, as an example.
Now, what the system will do is it'll take that particular document and thematically
group it with other contracts that are semantically similar.
So it's essentially doing a semantic analysis to figure out, okay, there are 20 documents
that are all within this particular cluster.
Now, once it's actually built up that semantic cluster, it's able to compare those
documents on properties like, okay, how have they been classified as an example? And if it sees,
okay, here's this one document that has been classified as being public while all of its
peer documents are classified as confidential, it's essentially using the baseline properties of the dominant
sets of documents in that peer group to identify what we call outliers. These documents that look
to have properties that look to be deviant or at a distance relative to the baseline sets of
properties for those documents. And so autonomously, the system uses what's called risk distance
to figure out, hey, here are these documents that have been classified incorrectly or have been shared outside the company compared to its peer documents, which have not been.
And so it's essentially using those baseline properties to autonomously figure out deviant sets of attributes that can potentially place those documents at risk. And so that's sort
of how the risk monitoring and the risk insights actually works. That's a really interesting aspect
of the product because I think that, you know, classifying is one thing, but doing risk analysis
is another. And that is one of the big selling points that you're offering, right? That you will be able to, you know, basically help to assess the risk of inherent and unstructured data.
Beyond things like what you've just mentioned, what other ways can the software assess risk using machine learning?
Yeah, so risk can come from a whole bunch of dimensions.
One, sharing data, right?
So you could share data, for example, outside the company.
I'll take a simple example.
You could have documents with customer data in them
that if an end user, I mean, today in the cloud world,
if you look at Box and Dropbox or even OneDrive and so on,
they were actually architected for collaboration first.
They were architected to make it super easy, right?
Every file can pretty much have its own sharing properties.
All you do is click on a file and say, hey, share this with this person's Gmail or Yahoo,
and boom, that person has access to that particular document.
Now, that introduces a whole dimension of risk that really enterprises weren't geared
for.
So risk can come from how documents are shared.
It can even come from location.
In fact, a lot of customers will use things like
they'll put trading documents within a trading folder.
They will tighten down the trading folder
and make sure that only the right people have access to it.
But what if a user downloads the document,
modifies it, puts it in a folder for public viewing, right?
The enterprise has no clue that that has actually happened.
So attributes like location, access permissions, who has actually accessed it, can all introduce
classification, can all actually introduce risk. And today is one of the hardest challenges for
enterprises, because if you ask a customer, if you ask an enterprise, okay, how are you going
to do this? They have to set up policy, which means upfront, they have to know exactly what they have to
do.
They have to define all of these rules.
So discovery is not the only thing that is at the mercy of policymaking.
Even risk is at the mercy of policymaking, where you have to define all these policies.
Enterprises simply don't have the ability, the knowledge, the wherewithal to essentially
be able to do that. And so giving them this sort of an ability to mine the latent
information, because the insight here is for the vast majority of people, they're going to do the
right thing, right? Deviations are going to be more the exception than the rule. And it's important
for systems to sort of creatively figure that out by using these sorts of comparisons.
And by the way, we also do what's called User 360.
So we can actually look at a user's profile and everything that the user is doing
and identify risk from a user perspective, meaning what have they done,
what are they doing that is deviant relative to all of their peer groups and so on.
And so those are all the ways in which we're able to mine for risk
without the customer
having to go in and define policies upfront.
Yeah, very good.
You know, some of the things you said there start to make me think, you know, right now
in networking, zero trust network access is a really big topic.
It's one thing that I'm really focused on.
And as I was scanning through your blog, you know, ahead of this podcast, I saw several
articles there talking about zero trust
for data security. And I'm just curious if you can explain a little bit more about that idea or
what that means. I think it ties into a lot of what you just said. Yeah, absolutely. You know,
we think of ourselves philosophically as building that sort of zero trust layer, except for data,
right? If you think about zero trust, our long-term vision is you're going to have to
build that sort of zero trust at the network layer, the application layer, and the data layer, and
so on.
And when it comes to the data layer, I mean, it's really tied to this notion of least
privilege access, which is only the right people should have access to the right sets
of data.
And how do you do that, right?
I mean, you can't do that today.
Today, the challenge with that is you have to go in and define all of these policies
that says, okay, if you're a hedge fund, for example, only the trader should have access
to trading documents.
And that should be true for the trading documents, agnostic to where they are.
It's not tied to a location.
It's not tied to a data store.
That sort of a policy should actually follow the file.
And how do you do that? That's
extraordinarily tricky. And so that's essentially what we do, which is by building up a deep context
around the data, we identify thematically what that data is. We identify what the right sets
of policies for that particular file are by doing peer comparisons to make sure that we are
essentially mining for what the least
privileged access permissions for that particular file or data element ought to be, and then
essentially enforcing those sets of permissions.
And especially now in a world where everybody's gone to doing remote work, collaboration has
actually exploded, just because what you could do where 10 of you could get into a room and essentially
collaborate on a whiteboard is now happening across a Teams or a Zoom session with whiteboarding.
And a lot of that information is now virtual.
And so how do you make sure that that information is only the right people have access to it?
And more importantly, when mistakes are made, either deliberately or carelessly,
that you're able to quickly identify that and remediate that
to make sure that you're buttoning down access
to only the right sets of people
for the appropriate sets of information.
Well, before we wrap up the discussion today,
I want to know, is there any summary that you'd like to say?
I mean, what would you like people to take away from this discussion, generally speaking, about how AI can be used in unstructured
data situations? Yeah, I think the meta point here is data remains, from a security standpoint,
data remains the most vulnerable threat surface. As enterprises have spent a lot of money on
networks and application security,
data security remains a frontier just driven by the fact that enterprise data volumes are growing exponentially. A vast majority of this data is unstructured and across both on-premises as well
as cloud data stores. And if enterprises don't have a good idea of what it is that they should care about in terms of the data that data that they ought to care about,
and also using AI intelligently to mine for risk, where you can identify scenarios where
there may be violations against your corporate security policy, and then helping remediate that
so that you can significantly lower the odds of data loss. Because ultimately,
while enterprises can care about network breaches,
a network breach is also eventually happening only because the hacker wants access to your data.
So it all comes down to data at the end of the day. And AI can actually meaningfully help you
identify what it is that you should care about and helping understand risk to it so that you can
significantly lower the odds of data loss from careless users, insiders, or compromised accounts.
Well, thank you very much for that, Karthik.
And it's great to have you here on the Utilizing AI podcast.
As I warned you at the start, before we started recording, at the end of each episode,
we like to ask our guests a few questions to surprise them
and to see what they think of the future of AI technology. So that time has come. Warning to the
audience, we have not warned him about what kind of questions he's going to get. So we'll get some
quick answers off the cuff here. Let's see what we've got. So let's start with this one.
How long do you think it will take before we have a conversational
AI that can pass the Turing test and fool an average person in a verbal exchange?
A very long time. We tend to overestimate short-term progress and underestimate long-term progress, I would say probably another 40, 50 years,
just because human language has, it's not just the content, it's also tone.
And tone, I think, is going to be the hardest thing
for conversational AI platforms to be able to pick up.
So I think it's going to take a while.
Great. Okay. Thank you. Next, one of the things that we talk about quite a lot on the Utilizing
AI podcast is the inherent bias in machine learning models based on what information
they've been fed. Do you think that it's possible to create a truly unbiased AI? I think the answer is yes. I think that,
so the answer is yes. I mean, it is a function of training data sets and so on. Yes, the answer is
yes. The answer is yes, only driven by the fact that I think that if people are focused on the problem, it's actually a very solvable problem, in my opinion.
All right. And one more question here. Can you think of any fields, any industries, any jobs that have not yet been touched at all by artificial intelligence? Can you think of a field that has not been touched
by artificial intelligence?
Boy, no, I can't think of any.
I think efforts have been made
in almost every frontier that I can think of.
And now some have been more successful than others, but no, not that I can think of.
Well, I guess that's why we're doing this podcast, because essentially artificial intelligence
is touching everything in every enterprise, every business, every field of study.
Well, thank you so much, Karthik, for joining us today.
Where can people connect with
you to follow your thoughts on artificial intelligence, machine learning, and unstructured
data? You can find me on Twitter at KK underscore Karthik. You can also connect with me on LinkedIn.
I also have a blog in my copious spare time. I do write quite a bit, which you can find at blog.com, www.concentric.ai.
And that's another way to essentially be able to follow my work.
How about you, Chris?
Yeah, you can find me on Twitter at Chris Grundemann or online, chrisgrundemann.com.
All right.
Thank you.
And you can find me, Stephen Foskett, at S Foskett on most social media sites, including
Twitter, which is probably my main method of people connecting with me.
Also, I will point out, we are currently planning our next AI Field Day event.
If you'd like to get involved in our second AI Field Day
event, just go to techfielday.com, click on the little brain AI icon there, and you can learn
more about that event series and see who's coming and joining us at AI Field Day. Well, thank you
very much for listening to the Utilizing AI podcast. If you enjoyed this discussion, remember
to subscribe, rate,
and review the show on iTunes since that does help our visibility.
And please do share this show
with your friends.
This podcast is brought to you
by gestaltit.com,
your home for IT coverage
across the enterprise.
For show notes and more episodes,
go to utilizing-ai.com,
find us on Twitter
at utilizing underscore AI,
or subscribe in your favorite
podcast application.
Thanks, and we'll see you next week.