Orchestrate all the Things - Alation announces Open Data Quality Initiative as part of its data intelligence strategy. Featuring CEO / Co-founder Satyen Sangani
Episode Date: May 24, 2022Data quality is part of data intelligence. It's a topic that lot of people are concerned about, and it makes engagement and adoption around data intelligence solutions better. With many data qual...ity solutions with different approaches available in the market, customers need to be able to choose the one that works best for them. Plus, if you are someone like Alation, a vendor whose core business is not data quality -- if you can't beat them, join them. As Alation CEO and co-founder Satyen Sangani shared, that was the thinking behind today's announcement of the Alation Open Data Quality Initiative (ODQI) for the modern data stack. As Alation notes, the program provides customers with the freedom of choice and flexibility when choosing the best data quality and data observability vendors to fit the needs of their modern, data-driven organizations. We caught up with Sangani to discuss the ODQI and where it fits in the broader data intelligence landscape, as well as Alation's strategy and evolution. Article published on VentureBeat
Transcript
Discussion (0)
Welcome to the Orchestrate All the Things podcast.
I'm George Amadiotis and we'll be connecting with the data quality as part of data intelligence.
It's a topic that a lot of people are concerned about and it makes engagement
and adoption around data intelligence solutions better.
With many data quality solutions, different qualities available in the market,
customers need to be able to choose the one that works best for them.
Plus, if you're someone like Alation, a vendor whose core business is not data quality,
if you can't beat them, join them.
As Alation CEO and co-founder Satyen Sangani shared,
that was the thinking behind today's announcement of the Alation Open Data Quality Initiative for the Modern Data Stack.
I hope you enjoyed the podcast.
If you like my work, you can follow Linked Data Registration on Twitter, LinkedIn, and Facebook.
I guess I'd best describe myself as a former analyst or an analyst myself.
And so in the sense that academically, I was trained in economics first at Columbia, then at Oxford.
Between those two stints, I worked on investment making,
and so did financial analytics work. That wasn't quite my cup of tea because I really wanted to
build something. And so ultimately, after getting out of grad school, I started working at Oracle,
first as a product manager, and then grew up over the course of 10 years
to become effectively a general manager
in a business that would sell financial applications
and analytical applications to large finance companies.
So think National Australia Bank and Citibank
and Bank of America.
The work there really informed what I am doing here at Alation in the sense that we would sell these large scale packages to these big companies that would help them analyze their data.
And what you would ultimately find in that work is that the companies didn't really understand the data itself. And so often what you would see is that two years, hundreds of millions of dollars
would be spent standing up the software. And often a lot of that time was spent on locating which
data had the right, which systems have the right data, how the data was used, what the data meant.
Often there were multiple copies of the data, conflicting records with the data.
And the people who understand the systems and the data models were often outside of the data, conflicting records with the data. And the people who understand the systems and the data models were often outside of the company,
not inside of the company.
And so my realization there was on one level
as a consumer of data and a former analyst,
and then on the other is now a supplier of software.
I kind of realized that all of this kind of data modeling,
data schemas, data writ large,
was really sort of that the description of that data was
really more of a knowledge management problem than it was a technical problem.
I think previously people had, you know, thought of it as a technical problem and
so the insight for Alation came out of that. And so then in 2012 left Alation,
sorry, left Oracle, started Alation, you know, with months of leaving.
And, you know, I guess here we are 10 years on and I'm working on the same problem.
So, you know, I'm either boring or persistent or both.
It's definitely not sold.
So, well, I wouldn't blame you for sticking around.
Yeah, I think that's actually, I do think it's a very rich problem space, right?
I mean, for lots of reasons, like it sort of exists the, you know, on one level, there's sort of a human psychology aspect on another level, there's sort of a didactic aspect in
terms of how do you enable and teach people how to use, you know, quantitative reasoning
and thinking and scientific method better, which is obviously a, you know, first world hard problem. And then on the other hand, there's a whole bunch of problems in terms
of both computer science around machine learning and AI that one would have to contend with. And
then, you know, certainly just the, you know, general day-to-day challenge of building a
software company. So yeah, I mean, the work is fabulous and who could complain about being able
to, you know, solve it. And I feel like it's one of those problems where we could work for another two, three,
four or five decades and we'll make progress, but there still will be work to do.
Yeah.
Yeah.
And well, since you have been around for a long, long time, I thought, and by the way,
it really helped me in the fact that you have a timeline actually embedded somewhere
in your website. So while doing a little bit of background research, I stumbled upon them and it
was really helpful. Usually, you know, you're able to cobble together those facts, but well,
the fact that you have them all together in one place really helps when you want to do that sort
of thing. So I thought, well, obviously, you know, those I thought well obviously you know those facts and that
timeline very very well so I thought let's perhaps take a short path through time together and
I'll let you pick what you think are the most well significant let's say points in time along
that timeline because I have the feeling that going through
the timeline as well as the related facts, let's say along that timeline, I get the feeling that
you have Alation as a company has somehow evolved and taken different directions throughout that
time. So you started out from well, data catalogs, and you have added a few other elements through time. So would you like to pick out the points through that time that you think are the
richer in terms of adding to your initial destination?
Yeah, I think there's probably two timelines that are relevant.
The first is the market timeline, which obviously operates on its
own speed and pace. We're an influencer there, but we're not necessarily the exclusive sort of
setter of trends, right? So I think the market started in a world broadly that centered around
this concept of metadata management, where a lot of software that was
in this space and the data middleware space per se was sold into IT. And I think that described
the market for a reasonable period of time, probably, you know, certainly up till at least
about 2009, 2010. Somewhere around that time, same time period, 9, 10, there was a beginning of a market
that sort of coincided with the rise of Tableau, but much more interestingly, sort of Basel II and
HIPAA and the rise of privacy and the rise of sort of information management somewhat evolving out
of the 2008 crisis, but also evolving out of just bigger awareness and bigger penetration of the
internet. And that started sort of a movement around data governance.
And then certainly, I think with the advent of Hadoop,
there was this massive, right around 2012, 2013,
when we were founded, massive explosion of data
and the ability of companies to exploit data science
as a competitive differentiator.
And that gave rise to the thing
that we created, which is called the data catalog, which was a much more consumer-led
information management framework. So I think that in the background sort of both describes
the market history around these three sectors, which we think are ultimately now coming together,
metadata management, data governance, data cataloging into a broader market space, which
we and others call data intelligence.
And I think those three spaces are convergent, right?
I think in our evolution, we were founded, of course, in 2012.
I left Oracle.
Within the early timeline, so much happened in those early three years.
That's the stage for what's happening right now.
Specifically met my co-founding team late 2012, two of whom were individuals who were
from Google, the other of whom was from Apple.
We stayed in stealth for quite a long period of time.
And so the company didn't launch itself until I think it was March 2015 and in that time
we really just worked with roughly 10 customers that allowed us to define the
product allowed us to define what we were trying to do and that gestation was
not only important for the technological development but also for just really
discovering like who was it that I was trying to use this product and how does
it differ from different from the other two,
data governance and metadata management products that were out there.
After 2015, we really went through a phase
where we basically spent about two years
just creating the category.
What is a data catalog was new to lots of people.
People didn't understand the concept.
Lots of people thought it was a feature.
And that work was probably about you know roughly till about maybe 2017-2018. The category started to form and then what we found
was that other players from metadata management, from data governance, started
to also converge on building a data catalog and we found that in that time
period we had to respond by
entering a couple of different markets, data governance and metadata management, almost as
a response. But we did so as a smaller player, but we did so with a much more clear platform
approach where we basically said, look, having the inventory of the data, having all the people
using the data is the competitive differentiator because unlike those products centered in
either compliance or in IT, those are very narrow audiences. And this ability to be able to have
something that's used by thousands of people and attaching to thousands of systems is the
core differentiator. And so that over the last four years has been borne out to be true. And
what we're finding is the company is growing faster than it's ever had
because this catalog is actually a platform to help people solve these problems
in a way that's more efficient than perhaps the other technologies
would have been on a standalone basis.
And so now I think, you know, fast forwarding to today
and covering all of that ground, you know,
our intention is to
really win this data intelligence market and i think we're going to try to do that with you know
simultaneous investment and just go to market capabilities but also and just building the
technology and making it you know as as bulletproof as we possibly can over time. Yeah, thank you. And I think that helps because, well, for someone coming to this, well, actually, that's
part of the problem.
I'm kind of hard pressed to put my finger on a single term, a single definition and
say like, okay, so this is like the market that you and companies in that space are addressing. So there's, again, different terms flying around,
and it's not always easy to pinpoint that.
And what you just mentioned kind of helps explain the scope,
let's say, of the issues that you're addressing a little bit.
And in your timeline, I heard a couple of things that I was expecting to hear.
So about Hadoop, for example, because obviously, that led to, you know, democratization of big data
and therefore the need to manage all that data. But I have to say that I was also expecting to
hear something that I only got an indirect reference to. So you said something like,
excuse me, like, four years ago, you saw an uptick in the market
and that kind of coincides with the advent of GDPR and the requirements that you also
refer to the previous regulation, like HIPAA, for example.
And when you have regulation like that, it always brings a sort of uptick in the market
because people have to comply and therefore
they have to monitor data and so on.
So would you say that was also an important time in your timeline?
Yeah.
And for two reasons, I think that the all regulated data governance largely is kind
of the enabling capability that people invest in when they want to comply with regulations.
And so certainly starting with Basel II and then HIPAA and then absolutely with GDPR and CCPA,
that has been a significant catalyst. Regs in particular is that it caused a rethink for all
of the data that most customers store and you couldn't solve it
with their traditional data governance framework. So in a traditional data governance framework,
often you have top-down policies that would be sort of in theory, you know, tested to by people
who are touching and using the data. But with GDPR, you actually had to delete physically the
data that exists inside of these companies.
And to do that, you'd need a really strong inventory of the data.
And so that caused a convergence or I think accelerated the convergence between cataloging and metadata and governance because you had to have a holistic framework where the previous regs didn't quite require that in the exact same way.
Thank you.
Okay, so I think we have enough background covered.
So we may as well shift gears and come to what you're about to announce the day after
tomorrow, actually, I think is the date.
So it's a new initiative, which is called Open Data Quality Initiative and well
having a first look at what it is like in the draft press release that I had the opportunity
to look, I have to admit I had a little bit of trouble identifying exactly what it is, because on the surface, it looks sort of like
opening, let's say, API access to your core product in an industry-friendly way.
But again, the word initiative also kind of implies that, well, perhaps there's more to it,
basically. So it implies to things like, well, other stakeholders having a say in it
or perhaps some sort of broader governance.
So I was wondering if you could enlighten us a little bit on what it is exactly.
Yeah, absolutely.
So I think important to start with the strategy and the mission, right?
So in our case, what we're
basically saying is we believe this data catalog is the platform for this broader category around
data intelligence. Now, data intelligence, IDC identified category has a lot of different
components to it. Data, master data management's a good example of what is part of data intelligence.
Privacy data management is a part of data intelligence.
So too is reference data management.
So too is, in some cases, data transformation.
In some cases, we have other capabilities.
In this case, we're talking about data quality and data observability.
Now, I think a lot of the historical players in this space have sort of taken what I'll
call a vertical approach, where they've basically said, we're going to own one box of every single one of these things.
We're going to have a data quality solution. We're going to have a master data management solution.
We're going to have these multiple solutions in these spaces, and that's the way in which we're going to win.
We're going to differentiate horizontally, and we're going to try to sell a single package. Our strategy is basically to say something that's quite different and quite kind of the opposite
to what these historical players have traditionally said, which is, look,
the real problem in this space is not whether or not you have the capability to tag data.
That is a problem, but it's certainly not the big problem.
The big problem is really engagement and adoption.
Most people don't use data properly. Most people don't have an understanding of what data exists. Most people
don't engage with the data. Most of the data is under-documented. And so this idea of the data
catalog is really all about engaging people into the data sets. But if that's our strategy to
basically focus on engagement and adoption, that means that there are some things that
strategically we're not doing. And what we're not doing is building a data quality solution. What we're not doing is
building a data observability solution. What we are not doing is building a master data management
solution. Now we are building some capabilities like lineage and data governance and certainly
data cataloging, which is where it historically existed. But to basically deliver around engagement and adoption,
which is what our customers are looking for,
our customers really need to be able to go and operate
and take solutions from the rest of the market.
And so in that sense, we basically said,
look, one, data quality is a hot topic.
A lot of people are concerned about it.
Certainly makes engagement and adoption
around data intelligence solutions better.
So it's something that customers want to buy either after buying their data catalog or
maybe sometimes even with buying their data catalog.
But, you know, can we be competitive?
Can we actually build a solution ourselves?
And what we realized was no.
And that was true for two reasons.
I think, first of all, this is a really quickly evolving market.
You know, you'll notice that on the press release, there's companies like Soda, Big Eye, Anomalo.
These are all companies that have been funded in the last two years.
And interestingly, while they all sound like they're in the same space, have very different
approaches for how to do data quality.
And so the important idea was, well, this is actually a problem where there are people who are taking multiple approaches with multiple different buying
audiences to the different problems around data quality and data
observability so why not partner with all of these folks let our customers
choose which solution is most appropriate for them and then allow them
to innovate and then you know of course the other question around innovation is
can we do it better than these folks? And what we realized was the answer is not really. We don't
have massive competitive differentiation outside of the information in our catalog, which we're
happy to share. And that really is what's turning us into a good example of what turns us into a
platform. Okay, I see. So in a way, what you're, what you're, the role you're, you're trying to play, let's say, with this initiative is to sort of act like the middleman, like, well, the committee of sorts that stands in the middle and lets people work with each other and sort of defines what a good API is, or like a gold standard for interoperability would be for data quality?
That's exactly right. Like what we're trying to do is to say, look, we know data quality is
important. There's a lot of different ways to do data quality. There's a lot of different ways to
assess and measure data quality. There's a lot of different ways to build a data quality framework
and policy system. We know that as a catalog and governance provider,
customers as a part of those programs will need to have great data quality, but there are different
ways to peel the onion as it were. And so customers will then choose what they need to be able to
choose and they'll integrate with Alation appropriately. And we have a standardized
integration framework that will show them all that information. We'll factor into our search algorithms.
We'll factor into our governance and AI algorithms so that we can get the best benefits of scalability,
but allow customers to choose the right solution for them at the right price.
Okay, I see.
And well, obviously, there's a number of other companies that are already signing up, let's say, to be part of that program. And so I'm wondering whether you had pre-existing relationships, some partnership of sorts with
all of them, or they were contacted precisely for this purpose and they liked the initiative
and decided to come on board.
Yeah, the answer is both.
There's probably, I would say, the majority of partners have a common customer
with Alation where there are basically one or more customers,
and in some cases more than 10, but I would say certainly a handful,
where we basically have partnerships and also really key sort of
customer success stories. So the answer is both. But it's interesting, as we went out and said,
hey, we have this framework that we're building, and started talking to partners in the space,
many of them said, look, we really want to sign on to that, because we need a standard way
of being able to talk to all of the rest of the industry and the customers that
want to be able to use us and we really have a hard time explaining that right
now in terms of who ought to be using and touching data quality data what are
the start points what are the end points so by having this framework on some
level they're defining better the borderlines of their market space and
the problems they're solving and they also the problems they're solving and also the problems they're not solving.
Okay, I see.
So in a way, this is going to become, let's say,
a sort of de facto standard of limited scope at LTC initially.
Do your ambitions go as far as for it to become a sort of,
not de facto, but actual standard in some in some way, that would also imply obviously, I'm going out and talking to some of your competitors as well, because
well, if you want to, to have a standard for that, I guess they have to come on board as
well at some point.
We would certainly aspire to that, I think that to get any standard evolved or built, the first thing that you need to do is
claim adoption. And so I think often people say, oh, we're going to go build a standard. Let's go
recruit a whole bunch of partners or competitors or new SIs who are going to implement around this
standard. But the reality is customers are the ones who define whether standards live or die.
And so the best thing you can focus on is getting more customers to actually adopt the software. We definitionally, and I think you're
right, this is an example of being open in a way that we've been doing historically. So a couple of
quarters ago, we announced the open connector framework, which will allow and allows anybody
to build a connector for metadata for any
data system so this could be a database or a bi tool or file system this is an extension of that
to now moving into data quality and you should expect to see us build these open integrations
and frameworks over time because we do think that in the world of data management there has to be a
consistent way to sort of share this metadata.
And if that doesn't exist,
then people are gonna be trapped,
you know, at some level relying upon their own systems
because people won't be able to benefit
from the knowledge inside of their
various software applications.
Okay, well, okay.
So that's a good segue then for me
to go to the last part of the conversation,
or the last set of questions I had lined up for you,
which basically had to do with your roadmap
and where that specific initiative fits in your roadmap.
But since you kind of mentioned interoperability in metadata and all that,
I'm tempted to actually ask you a more specialized question.
So when I hear the terms metadata and interoperability,
well, especially in the same sentence,
I'm kind of tempted, inclined to think of a specific technology there.
So basically, knowledge graphs and RDF and all that, because well, that's first because I
have a background in that. And second, because well, it's kind of, it's kind of obvious. And
it's also kind of trending, let's say lately. So and also have to say, some anecdotal, let's say,
experience here, not too long ago, I was at the conference around those topics, and there were also some people from Malaysia
who were very interested in getting to learn more about that.
And we kind of struck up a conversation,
and they said that at least at the time,
and that was like sometime like four years ago or something,
they said that they didn't have any immediate plans of using the technology.
I'm wondering if that has changed at all during the time that Transcribe since then.
And, well, again, that's a more specific question.
And to come back to the more general one, so what's your roadmap going forward?
Yeah.
So I think there's linked data as a concept and there's linked data as a technology.
When we started the company, I had done so actually having done a lot of research into RDF, semantic web, and linked data as a general concept.
And we looked at back in 2012, implementing Elation on top of those frameworks. But what we realized was the big problem in metadata wasn't so much the combination of
data or the reasoning that one could build off of getting the metadata.
It was just simply acquiring the metadata itself.
On some level, it's the plumbing.
And often I describe myself as a plumber and that I've been doing a lot of plumbing for
the last 10 years.
And on some level, this kind of example of the open data framework, data quality framework is an example of more plumbing.
It's allowing people to orchestrate integration into the platform. I do think that over time,
you will see us leverage knowledge graphs in order to be able to make inference and recommendation
off of data that will exist within the platform.
Because that is where I think all customers would love to go.
They'd love to go to a situation where, wow, don't just let me search for something,
recommend to me something.
Don't tell me how to tag this data, tag it automatically for me.
And certainly linked data and AI and machine learning are all elements and ingredients
to being able to build a more intelligent data intelligence layer.
So I do think there is an opportunity to use those technologies.
I think there's a difference between using those technologies as a baseline mechanism
to solve some of the very, I would say, unsexy technical problems that exist
where perhaps linked data may not be the right solution for all sort of
orchestration and acquisition of the data but i do think in terms of once you've orchestrated it
once you've acquired it once you've cleaned it then linked data can be extremely powerful
technology to do a whole bunch of stuff around recommendation inference and association
i think by the way that you're already using at least some machine learning,
and I'm kind of guessing that you're probably doing that to do things like recommendations for
things such as, well, I don't know, which fields people should merge or what else to look at or
that kind of thing. That's exactly right. One great example of that is around where we leverage NLP
to be able to do named entity recognition.
So we can see a term TXN, recognize that that means transaction,
or alternatively, that might mean trucks,
and we might be able to then make a recommendation
based upon the linguistic model within the company
to say which is which.
And then we can do that automatically within the framework.
And then there are other examples that we're using for our more recent
acquisition, Lingo, where we are basically allowing people
to write English language sentences and convert that into SQL
to be able to do interactive interrogation of data
sets and queryable data sets.
Yeah, well, like you said,
when you do the kind of thing that you do,
there's a lot of plumbing involved inevitably,
but well, at least you get to build some cool stuff
on top of it too.
Yeah, for sure.
And I think that's what I'm excited about.
I mean, as far as the space goes, yes, you have to do, you have to basically solve the problems in front of you and in front of your customers.
And while it is seemingly unsexy to have to build all of this sort of commonality, kind of the single layer, if you will, with all of these connectors, it enables you to do so many things that allows for acceleration over time.
And so on some level, I'm actually on every level, I'm probably more excited I am about
what we're able to do in the next five years than I think what we've done in the past five,
because all of it lays the foundation for just some really cool applications that we'll
start seeing in the near term.
I hope you enjoyed the podcast.
If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn, and Facebook.