Orchestrate all the Things - Alation announces Open Data Quality Initiative as part of its data intelligence strategy. Featuring CEO / Co-founder Satyen Sangani

Starting point is 00:00:00 Welcome to the Orchestrate All the Things podcast. I'm George Amadiotis and we'll be connecting with the data quality as part of data intelligence. It's a topic that a lot of people are concerned about and it makes engagement and adoption around data intelligence solutions better. With many data quality solutions, different qualities available in the market, customers need to be able to choose the one that works best for them. Plus, if you're someone like Alation, a vendor whose core business is not data quality, if you can't beat them, join them.

Starting point is 00:00:30 As Alation CEO and co-founder Satyen Sangani shared, that was the thinking behind today's announcement of the Alation Open Data Quality Initiative for the Modern Data Stack. I hope you enjoyed the podcast. If you like my work, you can follow Linked Data Registration on Twitter, LinkedIn, and Facebook. I guess I'd best describe myself as a former analyst or an analyst myself. And so in the sense that academically, I was trained in economics first at Columbia, then at Oxford. Between those two stints, I worked on investment making, and so did financial analytics work. That wasn't quite my cup of tea because I really wanted to

Starting point is 00:01:13 build something. And so ultimately, after getting out of grad school, I started working at Oracle, first as a product manager, and then grew up over the course of 10 years to become effectively a general manager in a business that would sell financial applications and analytical applications to large finance companies. So think National Australia Bank and Citibank and Bank of America. The work there really informed what I am doing here at Alation in the sense that we would sell these large scale packages to these big companies that would help them analyze their data.

Starting point is 00:01:53 And what you would ultimately find in that work is that the companies didn't really understand the data itself. And so often what you would see is that two years, hundreds of millions of dollars would be spent standing up the software. And often a lot of that time was spent on locating which data had the right, which systems have the right data, how the data was used, what the data meant. Often there were multiple copies of the data, conflicting records with the data. And the people who understand the systems and the data models were often outside of the data, conflicting records with the data. And the people who understand the systems and the data models were often outside of the company, not inside of the company. And so my realization there was on one level as a consumer of data and a former analyst,

Starting point is 00:02:34 and then on the other is now a supplier of software. I kind of realized that all of this kind of data modeling, data schemas, data writ large, was really sort of that the description of that data was really more of a knowledge management problem than it was a technical problem. I think previously people had, you know, thought of it as a technical problem and so the insight for Alation came out of that. And so then in 2012 left Alation, sorry, left Oracle, started Alation, you know, with months of leaving.

Starting point is 00:03:08 And, you know, I guess here we are 10 years on and I'm working on the same problem. So, you know, I'm either boring or persistent or both. It's definitely not sold. So, well, I wouldn't blame you for sticking around. Yeah, I think that's actually, I do think it's a very rich problem space, right? I mean, for lots of reasons, like it sort of exists the, you know, on one level, there's sort of a human psychology aspect on another level, there's sort of a didactic aspect in terms of how do you enable and teach people how to use, you know, quantitative reasoning and thinking and scientific method better, which is obviously a, you know, first world hard problem. And then on the other hand, there's a whole bunch of problems in terms

Starting point is 00:03:49 of both computer science around machine learning and AI that one would have to contend with. And then, you know, certainly just the, you know, general day-to-day challenge of building a software company. So yeah, I mean, the work is fabulous and who could complain about being able to, you know, solve it. And I feel like it's one of those problems where we could work for another two, three, four or five decades and we'll make progress, but there still will be work to do. Yeah. Yeah. And well, since you have been around for a long, long time, I thought, and by the way,

Starting point is 00:04:21 it really helped me in the fact that you have a timeline actually embedded somewhere in your website. So while doing a little bit of background research, I stumbled upon them and it was really helpful. Usually, you know, you're able to cobble together those facts, but well, the fact that you have them all together in one place really helps when you want to do that sort of thing. So I thought, well, obviously, you know, those I thought well obviously you know those facts and that timeline very very well so I thought let's perhaps take a short path through time together and I'll let you pick what you think are the most well significant let's say points in time along that timeline because I have the feeling that going through

Starting point is 00:05:05 the timeline as well as the related facts, let's say along that timeline, I get the feeling that you have Alation as a company has somehow evolved and taken different directions throughout that time. So you started out from well, data catalogs, and you have added a few other elements through time. So would you like to pick out the points through that time that you think are the richer in terms of adding to your initial destination? Yeah, I think there's probably two timelines that are relevant. The first is the market timeline, which obviously operates on its own speed and pace. We're an influencer there, but we're not necessarily the exclusive sort of setter of trends, right? So I think the market started in a world broadly that centered around

Starting point is 00:06:01 this concept of metadata management, where a lot of software that was in this space and the data middleware space per se was sold into IT. And I think that described the market for a reasonable period of time, probably, you know, certainly up till at least about 2009, 2010. Somewhere around that time, same time period, 9, 10, there was a beginning of a market that sort of coincided with the rise of Tableau, but much more interestingly, sort of Basel II and HIPAA and the rise of privacy and the rise of sort of information management somewhat evolving out of the 2008 crisis, but also evolving out of just bigger awareness and bigger penetration of the internet. And that started sort of a movement around data governance.

Starting point is 00:06:48 And then certainly, I think with the advent of Hadoop, there was this massive, right around 2012, 2013, when we were founded, massive explosion of data and the ability of companies to exploit data science as a competitive differentiator. And that gave rise to the thing that we created, which is called the data catalog, which was a much more consumer-led information management framework. So I think that in the background sort of both describes

Starting point is 00:07:15 the market history around these three sectors, which we think are ultimately now coming together, metadata management, data governance, data cataloging into a broader market space, which we and others call data intelligence. And I think those three spaces are convergent, right? I think in our evolution, we were founded, of course, in 2012. I left Oracle. Within the early timeline, so much happened in those early three years. That's the stage for what's happening right now.

Starting point is 00:07:47 Specifically met my co-founding team late 2012, two of whom were individuals who were from Google, the other of whom was from Apple. We stayed in stealth for quite a long period of time. And so the company didn't launch itself until I think it was March 2015 and in that time we really just worked with roughly 10 customers that allowed us to define the product allowed us to define what we were trying to do and that gestation was not only important for the technological development but also for just really discovering like who was it that I was trying to use this product and how does

Starting point is 00:08:22 it differ from different from the other two, data governance and metadata management products that were out there. After 2015, we really went through a phase where we basically spent about two years just creating the category. What is a data catalog was new to lots of people. People didn't understand the concept. Lots of people thought it was a feature.

Starting point is 00:08:49 And that work was probably about you know roughly till about maybe 2017-2018. The category started to form and then what we found was that other players from metadata management, from data governance, started to also converge on building a data catalog and we found that in that time period we had to respond by entering a couple of different markets, data governance and metadata management, almost as a response. But we did so as a smaller player, but we did so with a much more clear platform approach where we basically said, look, having the inventory of the data, having all the people using the data is the competitive differentiator because unlike those products centered in

Starting point is 00:09:25 either compliance or in IT, those are very narrow audiences. And this ability to be able to have something that's used by thousands of people and attaching to thousands of systems is the core differentiator. And so that over the last four years has been borne out to be true. And what we're finding is the company is growing faster than it's ever had because this catalog is actually a platform to help people solve these problems in a way that's more efficient than perhaps the other technologies would have been on a standalone basis. And so now I think, you know, fast forwarding to today

Starting point is 00:10:01 and covering all of that ground, you know, our intention is to really win this data intelligence market and i think we're going to try to do that with you know simultaneous investment and just go to market capabilities but also and just building the technology and making it you know as as bulletproof as we possibly can over time. Yeah, thank you. And I think that helps because, well, for someone coming to this, well, actually, that's part of the problem. I'm kind of hard pressed to put my finger on a single term, a single definition and say like, okay, so this is like the market that you and companies in that space are addressing. So there's, again, different terms flying around,

Starting point is 00:10:48 and it's not always easy to pinpoint that. And what you just mentioned kind of helps explain the scope, let's say, of the issues that you're addressing a little bit. And in your timeline, I heard a couple of things that I was expecting to hear. So about Hadoop, for example, because obviously, that led to, you know, democratization of big data and therefore the need to manage all that data. But I have to say that I was also expecting to hear something that I only got an indirect reference to. So you said something like, excuse me, like, four years ago, you saw an uptick in the market

Starting point is 00:11:26 and that kind of coincides with the advent of GDPR and the requirements that you also refer to the previous regulation, like HIPAA, for example. And when you have regulation like that, it always brings a sort of uptick in the market because people have to comply and therefore they have to monitor data and so on. So would you say that was also an important time in your timeline? Yeah. And for two reasons, I think that the all regulated data governance largely is kind

Starting point is 00:12:00 of the enabling capability that people invest in when they want to comply with regulations. And so certainly starting with Basel II and then HIPAA and then absolutely with GDPR and CCPA, that has been a significant catalyst. Regs in particular is that it caused a rethink for all of the data that most customers store and you couldn't solve it with their traditional data governance framework. So in a traditional data governance framework, often you have top-down policies that would be sort of in theory, you know, tested to by people who are touching and using the data. But with GDPR, you actually had to delete physically the data that exists inside of these companies.

Starting point is 00:12:47 And to do that, you'd need a really strong inventory of the data. And so that caused a convergence or I think accelerated the convergence between cataloging and metadata and governance because you had to have a holistic framework where the previous regs didn't quite require that in the exact same way. Thank you. Okay, so I think we have enough background covered. So we may as well shift gears and come to what you're about to announce the day after tomorrow, actually, I think is the date. So it's a new initiative, which is called Open Data Quality Initiative and well having a first look at what it is like in the draft press release that I had the opportunity

Starting point is 00:13:36 to look, I have to admit I had a little bit of trouble identifying exactly what it is, because on the surface, it looks sort of like opening, let's say, API access to your core product in an industry-friendly way. But again, the word initiative also kind of implies that, well, perhaps there's more to it, basically. So it implies to things like, well, other stakeholders having a say in it or perhaps some sort of broader governance. So I was wondering if you could enlighten us a little bit on what it is exactly. Yeah, absolutely. So I think important to start with the strategy and the mission, right?

Starting point is 00:14:24 So in our case, what we're basically saying is we believe this data catalog is the platform for this broader category around data intelligence. Now, data intelligence, IDC identified category has a lot of different components to it. Data, master data management's a good example of what is part of data intelligence. Privacy data management is a part of data intelligence. So too is reference data management. So too is, in some cases, data transformation. In some cases, we have other capabilities.

Starting point is 00:14:55 In this case, we're talking about data quality and data observability. Now, I think a lot of the historical players in this space have sort of taken what I'll call a vertical approach, where they've basically said, we're going to own one box of every single one of these things. We're going to have a data quality solution. We're going to have a master data management solution. We're going to have these multiple solutions in these spaces, and that's the way in which we're going to win. We're going to differentiate horizontally, and we're going to try to sell a single package. Our strategy is basically to say something that's quite different and quite kind of the opposite to what these historical players have traditionally said, which is, look, the real problem in this space is not whether or not you have the capability to tag data.

Starting point is 00:15:37 That is a problem, but it's certainly not the big problem. The big problem is really engagement and adoption. Most people don't use data properly. Most people don't have an understanding of what data exists. Most people don't engage with the data. Most of the data is under-documented. And so this idea of the data catalog is really all about engaging people into the data sets. But if that's our strategy to basically focus on engagement and adoption, that means that there are some things that strategically we're not doing. And what we're not doing is building a data quality solution. What we're not doing is building a data observability solution. What we are not doing is building a master data management

Starting point is 00:16:14 solution. Now we are building some capabilities like lineage and data governance and certainly data cataloging, which is where it historically existed. But to basically deliver around engagement and adoption, which is what our customers are looking for, our customers really need to be able to go and operate and take solutions from the rest of the market. And so in that sense, we basically said, look, one, data quality is a hot topic. A lot of people are concerned about it.

Starting point is 00:16:40 Certainly makes engagement and adoption around data intelligence solutions better. So it's something that customers want to buy either after buying their data catalog or maybe sometimes even with buying their data catalog. But, you know, can we be competitive? Can we actually build a solution ourselves? And what we realized was no. And that was true for two reasons.

Starting point is 00:16:59 I think, first of all, this is a really quickly evolving market. You know, you'll notice that on the press release, there's companies like Soda, Big Eye, Anomalo. These are all companies that have been funded in the last two years. And interestingly, while they all sound like they're in the same space, have very different approaches for how to do data quality. And so the important idea was, well, this is actually a problem where there are people who are taking multiple approaches with multiple different buying audiences to the different problems around data quality and data observability so why not partner with all of these folks let our customers

Starting point is 00:17:37 choose which solution is most appropriate for them and then allow them to innovate and then you know of course the other question around innovation is can we do it better than these folks? And what we realized was the answer is not really. We don't have massive competitive differentiation outside of the information in our catalog, which we're happy to share. And that really is what's turning us into a good example of what turns us into a platform. Okay, I see. So in a way, what you're, what you're, the role you're, you're trying to play, let's say, with this initiative is to sort of act like the middleman, like, well, the committee of sorts that stands in the middle and lets people work with each other and sort of defines what a good API is, or like a gold standard for interoperability would be for data quality? That's exactly right. Like what we're trying to do is to say, look, we know data quality is important. There's a lot of different ways to do data quality. There's a lot of different ways to

Starting point is 00:18:34 assess and measure data quality. There's a lot of different ways to build a data quality framework and policy system. We know that as a catalog and governance provider, customers as a part of those programs will need to have great data quality, but there are different ways to peel the onion as it were. And so customers will then choose what they need to be able to choose and they'll integrate with Alation appropriately. And we have a standardized integration framework that will show them all that information. We'll factor into our search algorithms. We'll factor into our governance and AI algorithms so that we can get the best benefits of scalability, but allow customers to choose the right solution for them at the right price.

Starting point is 00:19:15 Okay, I see. And well, obviously, there's a number of other companies that are already signing up, let's say, to be part of that program. And so I'm wondering whether you had pre-existing relationships, some partnership of sorts with all of them, or they were contacted precisely for this purpose and they liked the initiative and decided to come on board. Yeah, the answer is both. There's probably, I would say, the majority of partners have a common customer with Alation where there are basically one or more customers, and in some cases more than 10, but I would say certainly a handful,

Starting point is 00:20:01 where we basically have partnerships and also really key sort of customer success stories. So the answer is both. But it's interesting, as we went out and said, hey, we have this framework that we're building, and started talking to partners in the space, many of them said, look, we really want to sign on to that, because we need a standard way of being able to talk to all of the rest of the industry and the customers that want to be able to use us and we really have a hard time explaining that right now in terms of who ought to be using and touching data quality data what are the start points what are the end points so by having this framework on some

Starting point is 00:20:36 level they're defining better the borderlines of their market space and the problems they're solving and they also the problems they're solving and also the problems they're not solving. Okay, I see. So in a way, this is going to become, let's say, a sort of de facto standard of limited scope at LTC initially. Do your ambitions go as far as for it to become a sort of, not de facto, but actual standard in some in some way, that would also imply obviously, I'm going out and talking to some of your competitors as well, because well, if you want to, to have a standard for that, I guess they have to come on board as

Starting point is 00:21:16 well at some point. We would certainly aspire to that, I think that to get any standard evolved or built, the first thing that you need to do is claim adoption. And so I think often people say, oh, we're going to go build a standard. Let's go recruit a whole bunch of partners or competitors or new SIs who are going to implement around this standard. But the reality is customers are the ones who define whether standards live or die. And so the best thing you can focus on is getting more customers to actually adopt the software. We definitionally, and I think you're right, this is an example of being open in a way that we've been doing historically. So a couple of quarters ago, we announced the open connector framework, which will allow and allows anybody

Starting point is 00:22:03 to build a connector for metadata for any data system so this could be a database or a bi tool or file system this is an extension of that to now moving into data quality and you should expect to see us build these open integrations and frameworks over time because we do think that in the world of data management there has to be a consistent way to sort of share this metadata. And if that doesn't exist, then people are gonna be trapped, you know, at some level relying upon their own systems

Starting point is 00:22:32 because people won't be able to benefit from the knowledge inside of their various software applications. Okay, well, okay. So that's a good segue then for me to go to the last part of the conversation, or the last set of questions I had lined up for you, which basically had to do with your roadmap

Starting point is 00:22:53 and where that specific initiative fits in your roadmap. But since you kind of mentioned interoperability in metadata and all that, I'm tempted to actually ask you a more specialized question. So when I hear the terms metadata and interoperability, well, especially in the same sentence, I'm kind of tempted, inclined to think of a specific technology there. So basically, knowledge graphs and RDF and all that, because well, that's first because I have a background in that. And second, because well, it's kind of, it's kind of obvious. And

Starting point is 00:23:32 it's also kind of trending, let's say lately. So and also have to say, some anecdotal, let's say, experience here, not too long ago, I was at the conference around those topics, and there were also some people from Malaysia who were very interested in getting to learn more about that. And we kind of struck up a conversation, and they said that at least at the time, and that was like sometime like four years ago or something, they said that they didn't have any immediate plans of using the technology. I'm wondering if that has changed at all during the time that Transcribe since then.

Starting point is 00:24:10 And, well, again, that's a more specific question. And to come back to the more general one, so what's your roadmap going forward? Yeah. So I think there's linked data as a concept and there's linked data as a technology. When we started the company, I had done so actually having done a lot of research into RDF, semantic web, and linked data as a general concept. And we looked at back in 2012, implementing Elation on top of those frameworks. But what we realized was the big problem in metadata wasn't so much the combination of data or the reasoning that one could build off of getting the metadata. It was just simply acquiring the metadata itself.

Starting point is 00:24:56 On some level, it's the plumbing. And often I describe myself as a plumber and that I've been doing a lot of plumbing for the last 10 years. And on some level, this kind of example of the open data framework, data quality framework is an example of more plumbing. It's allowing people to orchestrate integration into the platform. I do think that over time, you will see us leverage knowledge graphs in order to be able to make inference and recommendation off of data that will exist within the platform. Because that is where I think all customers would love to go.

Starting point is 00:25:28 They'd love to go to a situation where, wow, don't just let me search for something, recommend to me something. Don't tell me how to tag this data, tag it automatically for me. And certainly linked data and AI and machine learning are all elements and ingredients to being able to build a more intelligent data intelligence layer. So I do think there is an opportunity to use those technologies. I think there's a difference between using those technologies as a baseline mechanism to solve some of the very, I would say, unsexy technical problems that exist

Starting point is 00:26:00 where perhaps linked data may not be the right solution for all sort of orchestration and acquisition of the data but i do think in terms of once you've orchestrated it once you've acquired it once you've cleaned it then linked data can be extremely powerful technology to do a whole bunch of stuff around recommendation inference and association i think by the way that you're already using at least some machine learning, and I'm kind of guessing that you're probably doing that to do things like recommendations for things such as, well, I don't know, which fields people should merge or what else to look at or that kind of thing. That's exactly right. One great example of that is around where we leverage NLP

Starting point is 00:26:46 to be able to do named entity recognition. So we can see a term TXN, recognize that that means transaction, or alternatively, that might mean trucks, and we might be able to then make a recommendation based upon the linguistic model within the company to say which is which. And then we can do that automatically within the framework. And then there are other examples that we're using for our more recent

Starting point is 00:27:12 acquisition, Lingo, where we are basically allowing people to write English language sentences and convert that into SQL to be able to do interactive interrogation of data sets and queryable data sets. Yeah, well, like you said, when you do the kind of thing that you do, there's a lot of plumbing involved inevitably, but well, at least you get to build some cool stuff

Starting point is 00:27:38 on top of it too. Yeah, for sure. And I think that's what I'm excited about. I mean, as far as the space goes, yes, you have to do, you have to basically solve the problems in front of you and in front of your customers. And while it is seemingly unsexy to have to build all of this sort of commonality, kind of the single layer, if you will, with all of these connectors, it enables you to do so many things that allows for acceleration over time. And so on some level, I'm actually on every level, I'm probably more excited I am about what we're able to do in the next five years than I think what we've done in the past five, because all of it lays the foundation for just some really cool applications that we'll

Starting point is 00:28:21 start seeing in the near term. I hope you enjoyed the podcast. If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn, and Facebook.

Orchestrate all the Things - Alation announces Open Data Quality Initiative as part of its data intelligence strategy. Featuring CEO / Co-founder Satyen Sangani

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.