The Data Stack Show - 32: Cooking with Data Ops with Chris Bergh from DataKitchen

Episode Date: April 7, 2021

On this week's episode of The Data Stack Show, Eric and Kostas talk with Chris Bergh, the CEO and head chef at Data Kitchen. DataKitchen’s mission is to provide the software, service, and knowledge ...that makes it possible for every data and analytics team to realize their full potential with DataOps.Highlights from this week's episode include: Chris' background and how the lessons learned in the Peace Corps and at NASA apply to him today (2:03)Why AI left Chris feeling like a jilted lover (7:49)Most projects that people do in data analytics fail (10:12)Three things that DataOps focuses on (16:37)Comparing and contrasting DevOps and DataOps (22:30)The types of data that DataKitchen handles and building a product or a service around DataOps (29:29)Fixing problems at the source instead of just offering a tool to slightly improve things downstream (37:17)Where we are at in the process of how companies are going to run on data (41:43)The Data Stack Show is a weekly podcast powered by RudderStack. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcript
Discussion (0)
Starting point is 00:00:00 The Data Stack Show is brought to you by Rudderstack, the complete customer data pipeline solution. Thanks for joining the show today. Welcome back to the show. We have a really interesting guest today, Chris Berg from Data Kitchen, really interesting company, a bootstrap company in the data ops space. And that's a category that I think Chris has really been working hard to define. So I think one thing that I'm really interested in is what his perspective on what data ops is. It makes intuitive sense, I think, for anyone who works in and around data.
Starting point is 00:00:48 But you traditionally hear DevOps, marketing ops, sales ops, biz ops, and DataOps is kind of a new term. So I'm really interested to hear what he has to say about DataOps. Costas. I think we are pretty aligned, Eric. Actually, that's one of the two main questions that I have.
Starting point is 00:01:05 Like, I'm also interested to see exactly like how DataOps is defined. That's one thing. The other thing is Chris is, I mean, any person who is trying to define a new category, it's by definition a visionary, right? So he's been also like quite a while in the industry. And I'm really eager to hear his prediction about where the market and the industry is going around data. Great. Well, let's jump in and chat with Chris. Chris, so great to have you on the show. We can't wait to hear more about Data Kitchen
Starting point is 00:01:35 and the success that you've had. Thank you for joining us. Oh, I'm really happy to be here. Thank you for the opportunity to talk about DataOps. Yes, absolutely. Well, before we get into the data stuff, which we always do because we have to, you know, stay honest to the name of the show, I'd love to just hear a little bit about your background and how did you, you know, what was your pathway into building a company that works in the DataOps space? Yeah, well, I have a technical background. So a big part of my career was sort of building software systems at places like MIT and NASA and Microsoft and a bunch of startups. And I got the bright idea in 2005 that I should go do data and analytics. And actually, my kids were small, and I thought it would be easy.
Starting point is 00:02:22 I'm a big software guy guy and maybe this data stuff, it's for lesser beings. And you know what? It was actually really hard. I managed teams that did data, what we call ETL. I had data scientists, people who did data is all working for me. And as a leader, it just, it kind of sucked. Things were breaking left and right. And I never could go fast enough for my customers. And there's nothing quite as fun as explaining to a head of sales who have 5,000 pharma sales reps under them, why the data is wrong and why you screwed up to really kind of want to change your perspective. And then, you know, I, I'm an engineer, but training, I like to innovate and I hired a
Starting point is 00:03:01 whole bunch of smart people and, you know, some liked R and some like Python and some like click and some like Tableau and some like writing SQL and some like doing visual tools and, but they all wanted to innovate and try something new. And so my life for many years was how do you go fast and innovate and how do you not break things? And so that perspective from a leader and a technologist perspective is, we think, is generalizable. And so we formed a company around it about seven years ago. And in that time, we've grown and we've had to market and describe the concept as clearly
Starting point is 00:03:39 as we can as engineers. So we wrote a manifesto. We hired a writer and actually wrote a book. And I go out and try to talk about these ideas because I think they're important. And I actually think they solve a problem that I have and a problem that I see almost every analytic team have. Sure. You know, one thing, one thing that I mean, Costas and I talk about data all the time because it's the work that we do day to day, but he has such a good perspective that he always reminds me, it doesn't really matter what you're dealing with. If you're dealing with data,
Starting point is 00:04:11 it's going to be messy. And that really is true. And I think, you know, I think you spoke to that in terms of just saying, you know, trying to deal with all of the, all the issues around pipelines and cleanliness and accuracy and all of that is crazy. So before we get into dedication, one thing I like to do, which has kind of become a pattern and I always monopolize the beginning of the conversation. So I apologize again for the 30th time, Costas, but I'm interested to know, one thing that we love asking our guests about is how their early career has influenced what they're doing now in data. So I know that you worked as a teacher in the Peace Corps early on, and then you also did some work with NASA, which is really interesting. And that's actually a common
Starting point is 00:04:58 thread among our guests, you know, sort of doing work in a scientific context, but I'd love to know, are there any lessons from those early experiences that you've carried with you today as you work with data, even though the context may be pretty different? Yeah, I think, well, number one, there's this cliche, all who wander are not lost. And I think it's okay to wander. And certainly when you're young, getting all careerist right away is I'm not sure the right thing to do. And I learned a lot by, you know, spending two and a half years kind of teaching math and Botswana. And I actually probably use those skills every day. And it is actually kind of what I do now.
Starting point is 00:05:43 I teach, I talk about these ideas and share them. And those communication skills, I think are hugely important. And if you want to advance yourself in a technical career, you quickly learn that, you know, your technical skills are great. And it's certainly great to stay an individual contributor and technically, but also communication and emotional skills become more. And so being able to work across, you know, teams is a fundamentally a communication challenge. And then second, just working at NASA was just a lot of fun. I mean, it was, I was really, I kind of was really into AI in 1988 and 1989. Like I just loved it. Like I went to graduate
Starting point is 00:06:22 school on it. It was in that time, it was like the winter of the winter and no one was into it. Like I took a machine learning class and it had six people in like right now, if you go to college campuses, there are hundreds of people trying to take an ML class. And it's, and so I sort of really was into AI for like five years. And we designed the system to automate air traffic control and sequence and space aircraft to kind of like into arriving at an airport in an optimal way. And it was fun, wrote papers and got systems installed in a bunch of airports. And it was a totally cool, a totally cool project. And it taught me a lot about how systems work and how especially
Starting point is 00:07:00 intelligent systems work. And so both those experiences, I think were just great. So interesting. Okay. So I just have to ask a question here while I'm stealing the stage. So AI and ML are very hot topics today. And you have a perspective that, you know, sort of predates like it being really cool, you know, in the 2020s. Tell us, you know, in that ML class, what are the things that are the same and what are the things that are really different? Because I think a lot of our listeners, me included, you know, I understand the power of ML and AI, but it's pretty new to me. And that's, you know, in part because I'm young, but the concepts themselves are not brand new. You know, they're not only five years old. You know, I don't want
Starting point is 00:07:50 to say I'm an expert on machine learning, but some of the things that have happened in the 20 odd years are, you know, people learn to train the middle layers of neural networks a lot better, and it's still back propagation, but the amount of techniques that you have to train a neural network, and especially the amount of data that you have to train a neural network has gotten a lot more, has gotten a lot of big, has gotten a lot of bigger. And there are some other techniques that have come along, like some of the ensemble methods that are good. But, you know, my perspective, and I don't want to lay myself as an expert here, but like one of the reasons I left AI, and I left it almost like a jilted lover, to be honest. And like, I loved it so much, but like one of the reasons I left AI and I left it almost like a jilted lover,
Starting point is 00:08:25 to be honest. And like, I loved it so much, but I got so frustrated with it because it got asymptotically hard to actually do anything. Like, and so we were trying to get, think of it like self-driving cars, how everyone says they're going to happen. But if you've watched the latest Tesla video sort of swerve and almost hit pedestrians. It's scary, right? And they've been at it for five years. And I felt the same about working at NASA and that it gets harder and harder for systems to get more intelligent because they lack the sort of semantic context that people have. And it just, you know, you can push something. We could push our accuracy of our sequencing and improve it by 1%, but that next 1% was so hard and actually caused so many understandable problems. And so finding the right level of automation, we actually backed off the amount of control
Starting point is 00:09:17 we gave, the amount of instructions we gave to air traffic controllers and just told them less and we got better results. And so the synthesis of intelligence with people was really important in my mind. And so when I got, when AI got really hot again, I was sort of felt like it's, it's that girlfriend I had in high school. And suddenly she was like a movie star. I'm like, what, what happened? Like, oh, I remember you. No, well, she didn't remember me. That's great.
Starting point is 00:09:50 Okay, well, I'm going to ask one more question and then hand the microphone to Costas. But thank you for entertaining me. I just always love to ask some sort of quasi-personal questions. But tell us about Data Kitchen, your company. You've been around for a while. You've seen the data revolution come about, you know, just in terms of technology and
Starting point is 00:10:05 process. So just tell us about Data Kitchen and what you do. And then Costas, I'm sure is burning with questions. Yeah. So, you know, when I started kind of full-time in data and analytics, I had to explain what it was to people like, oh, what do you do? We do charts and graphs. And they're like, oh, that's nice.
Starting point is 00:10:22 And you talk to people at a dinner party and they'd immediately turn away as if you mentioned that you were like, you know, a garbage man. They just had no idea what it was and it wasn't that interesting. And so what's happened the last 15 years are things like Moneyball and the idea that data is not just some exhaust, it actually is a generator of value. And that's a really important idea. And in fact, just like people have come to realize that every company is a software company, people are starting to realize that every company has a data and analytics part to it. And the succession of buzzwords of AI and ML and big data have all come in different than 15 years ago, is that the process of work to build these technically complicated systems, and whether the data is in batch or streaming or big or small, you put that data from a lot of sources in one place, and then you do something maybe predictive on it of some type of model, and maybe you visualize it, and maybe you govern it. And those
Starting point is 00:11:23 aspects, they're better than they were 15 years ago, but and maybe you govern it. And those aspects, they're better than they were 15 years ago, but they're still the same. And what remains is something quite embarrassing, I think, to the industry is that most projects that people do in data and analytics fail. There's an incredible 50, 60, one analyst at Gartner said 85% rate of failure in analytic projects. And that's just way too high. And sometimes I bring it up and I feel like I'm kind of saying something embarrassing at a party, but it is really too high. And what business works with a high failure rate? And so what is the real cause of that?
Starting point is 00:12:00 And why do these projects fail? And they fail whether you did the technical superstructure that you worked on, whether it was a small database or a Teradata or a big database, whether what kind of tool have and the tools they have all to work together is the hardest thing. And it's a people and process problem. And that the people and process that we should follow has already been discovered and people have actually kind of already figured out how to do it. And so if you look at the way people made factories 70 years ago and started off with, you know, sort of piece production and mass production. And finally people like Deming and the Toyota production system, they figured out a set of principles where a bunch of, a team of people are working on a shared technically complicated thing, an assembly line. And so they talked about things like safety
Starting point is 00:13:03 culture or theory of constraints, just in time or total quality management. And so they talked about things like safety culture or theory of constraints, just-in-time or total quality management. And then those are actually accepted now, right? And you don't want to run a factory without those ideas. You don't want to run it in some Taylorist way, and you want to make Toyotas and not AMC Pacer cars, which are crappy cars from the 80s near where I grew up in Wisconsin. And those ideas sort of started to percolate in the software industry where the Agile Manifesto was written 20 years ago. First DevOps conference was in 2009. And really, honestly, those ideas are kind of the same. How do you get a bunch of people who happen to be software developers and people running software
Starting point is 00:13:41 systems to work together on this shared technically complicated thing, the big piece of software, IT thing. And so what occurred to me when I was sort of suffering in 2005 and 2008, I started to read about Deming and manufacturing. And I was like, wow, these ideas really apply. And then of course I was steeped in software and agile and DevOps. And so we started to apply those ideas in that realm and early. So we started to do a lot of test automation. We did groups around trying to look for errors and quality. We tried to apply some of these ideas and tried to change the culture to make it, you know, people love their errors and not have shame. And so we built a version and then a second version of a system that did that. And then we sold that company because the system also had a BI system
Starting point is 00:14:32 and other stuff encased in it. And when we started Data Kitchen, we realized that we just could not work in any other way. And we were sort of committed bootstrappers and we were doing some work for customers. And then we realized that this way we work is the way everyone should work. And we're not just nerds to think that we actually have a better idea than other people. And this is an idea that, like I said, people have had in other industries and we just needed a way to talk about it.
Starting point is 00:14:56 And so we spent literally years trying to figure out how to describe what was in our head. Like we called it agile analytic operations. We called it DevOps for data science. We called it analytic ops for a while, but then if you shorten it down, it makes a really terrible shortened phrase. And so we, about four or five years ago, we called it data ops. And then we wrote a manifesto, the book, and we've been trying to talk about the concept since, and they're all based in just our experience. This is great chris actually you
Starting point is 00:15:25 said something a bit earlier that i really loved you talked about education and how being in the education helped you and that actually you're still an educator and i love that i think it's super important especially with technology actually many times when i'm talking about marketing in tech what i'm saying is that's the function marketing should have in tech, to educate people, right? And I think you were pretty early in many new, let's say, technology trends in the market. And I'm pretty sure that you are experiencing this also with Data Kitchen right now and DataOps. It's almost mandatory to be able to educate the people out there about the ideas that you have, how the product changed their lives, and how it can be used. And I'd love for our show here to be a channel for this kind of education.
Starting point is 00:16:17 So let's focus a little bit more on DataOps. You mentioned a little bit about how you came about the name, how your experience shaped your decision to create something around DataOps. What's, let's say, one sentence, two sentences, like definition of DataOps? How you could describe that? Well, I think it's a set of technical practices and cultural norms for data and analytic teams to focus on really three main things. One is being able to iterate quickly from the ideas in their head and get it into the production so they can get feedback from their customers and learn. The second is to run their factories with very, very low errors of any cost. And then third is to deal with the fact that your data and analytic teams
Starting point is 00:17:07 are teams and not just one team, but many team. And so how does your data science team relate to your centralized data team, to your self-service teams, to your data governance teams? And so it's about focusing on cycle time, error rates, and collaboration. And all those things end up, if you get those right, you actually end up being able to produce a lot more insight for your customer. And you end up being able to have a lot more customer trust. And your team is actually happier and more productive. Makes total sense. So I have some more questions around that. But before we go there, you mentioned the DataOps manifesto. And I've seen that before with the Agile manifesto, for example. What made you go after something like this? Why it was important to come up with
Starting point is 00:17:59 a manifesto for DataOps? So we went to our first conference and we wore chef's hats and gave out wooden spoons and people just thought we were freaking aliens. They had no idea what the term DataOps is. They had no idea what DevOps was. They just, are you an ETL tool? What are you guys? And we had paid all this money and it was just embarrassing. And we sort of realized that, wow, we've got to go back to the beginning. And so we wrote the Wikipedia article after that. And we wrote the manifesto, got some feedback from other people. And then we realized that we had to write about it in a very clear way. So we were always going to conferences and discussing, but we had to. And so the ideas, the expression of the ideas was actually really important. And kind of from a business, we felt like we were creating a software category.
Starting point is 00:18:45 And so we had to do the work. And, you know, I think, you know, thank goodness, we never got any funding, right? Because it just took a while. And it's still taking a while. Because, you know, we're sort of the anti lean startup. It's like, you know, we're sort of stuck on this idea, we know it's right, we got to find the right business to make it happen. But the education part is really what surprised me. And we've had over 10,000 people sign. We've had 15,000 people read the book. And I've literally had dozens of people who've read our book and then have gone off to influence their organization and to follow the DataOps principles. And I find that really interesting and really exciting that ideas can change. And I think that's all it is. And that's what's cool about what I really like about technical people is we just love learning and we love ideas and we want
Starting point is 00:19:35 to try some stuff. And it's a good idea and you should try it. Absolutely. So how do you write a manifesto? It sounds like something very revolutionary, let's say. What's the process? I mean, you mentioned that it takes time, iterations. You are the first person that I have met who has been involved in the manifesto, to be honest. So I'm really interested to learn more about it. Well, we stole, literally.
Starting point is 00:20:00 So we started in one of our conference talks, we had taken the Agile manifesto and removed the word software and put in data and analytics. And that actually kind of made sense, but it was sort of wrong. And then we like took it and put it into a Word document and started my co-founder and I started mailing it back and forth. And a fellow who worked for me also was involved. And then we just, you know, we, we tried to make it happen. And so there's, I looked at some things in DevOps, some things in lean, and we sort of, you know, it, when you live the pain for seven or eight years, or you're continually living the pain, because at that time I was, we had, we had built a sort of a early version of our product. And I
Starting point is 00:20:39 was actually also functioning as a data engineer day to day for a small company. And so we were using our product and I was doing the data work for a small pharma company. And so I was literally writing code and feeling the pain myself. And so it's been surprising that it's, I just thought it would be kind of silly and we tried to get other people to join in and some people did and gave some feedback, but it was mainly sort of us as a company putting it up. And yeah, I mean, it's marketing, right? So it could just be bullshit, but on the other, excuse me, can I say that online? So maybe you can keep it out, but, but it also is, it's an expression of really how we think those 18 points are really what we've learned. And so they're true for us.
Starting point is 00:21:22 Yeah. I mean, it's not wrong if something is marketing. As we said, like marketing is also education. It's also communication. So, and it seems like it's a great tool to communicate ideas and especially in a very early stage, right? Because, okay, as you said, DataOps was very new term. Like you need to communicate that. You need to create a consensus over what this term means.
Starting point is 00:21:43 And you have to establish this communication. And I guess having a manifesto and going back and forth and like talking with the community and agreeing on that, I think it's a great way to do it and create a new category, which is great. Quick question about, get a little bit more technical around like the concept of data ops.
Starting point is 00:22:00 I hear you all this time that we're discussing and you are mentioning about agile, DevOps. And my understanding is that there are techniques, best practices, methodologies, maybe also technologies that are related with these disciplines that are borrowed for DataOps. And correct me if I'm wrong. So can you share with us what from each one of the disciplines that affect data ops or inspire data ops are the most important? Yeah, and I think the first one is testing or monitoring or some companies who started
Starting point is 00:22:35 in the last year calling it observability. And so that goes to when you have data in production, you're in the squeeze, right? The data is coming in and you don't know if it's good or not. And therefore you don't know if your result's good or not. And so you want to make sure that the data is tested and monitored and correct before your customer sees it. You don't want to get that call on Friday afternoon saying the data's wrong. And then you're spending all night Friday trying to fix it, or you're leaving the soccer game on Saturday and your wife's giving you dirty looks because you just got an email, something's wrong. And by the way,
Starting point is 00:23:08 these things have happened to me and people I work for, and it's not fun. And I think you should build a system that you know that if it's right, and it tells you if it's right. And to do that, you've got to go in and grab bits of data, look at them, compare them to previous versions. You've got to test the size and shape. You've got to look at the artifacts, the models, the visualizations to make sure that they're all right. Because we have, if you take the manufacturing analogy a little further, the workstations in the assembly line are the tools that we use to do the work. And so there's a class of tools to do data work called ETL or ELT or data prep. There's a class of tools to apply models. There's a class of tools to visualize. There's a class of tools to govern. All those are sort of workstations that you use,
Starting point is 00:23:51 and data is passing along the assembly line on those workstations. And it doesn't matter if it's big or small or streaming or batch, you're still having a tool and that tool is governed by code. And that code has complexity to it, just like software systems do. So the first thing is that you run a factory and that's similar, but not quite as similar to software systems. The second is more similar is that it's analytics is code. That's one of the lines in the manifesto and your ETL tool may produce an XML file, but that is code equivalent in my mind because it runs in an engine. Your Viz tool may produce a visualization that's an XML, but that runs in an engine. You may have SQL code or Python code that's literally code.
Starting point is 00:24:34 And there's some tools like that are produced YAML files, which are very close to code or JSON files. And so you have a code governed system. Right. And so code means complexity. And so when you're doing data analytics, you're in the complexity business. And software actually has been in the complexity business for years. That's what it is, is how to deal with all this. And one way that software teams deal with complexity is to have a path to production that is automated. And so one aspect of that path to production is that they have a development environment where you can test things
Starting point is 00:25:10 and find out if you've broken anything. And so you can change something in the middle and see the effects downstream of it and in development. And I think that's an incredibly powerful concept. And a lot of data and analytics teams, most of them, A, aren't testing in development or they're doing it manually and they don't judge the sort of small effects. And so they end up building processes like meetings and technology review boards. And so the other process that software has done, in addition to complexity, in addition to testing, is automating the deployment of things from a development environment to production, making that smooth and fast and automated.
Starting point is 00:25:53 And so DevOps is kind of some of the same ideas are there, but DataOps is different. And fundamentally at a high level, like if you boil it all away, Agile says the thing that you're building, get it in front of your customer quickly and change it because you don't really know what they want. Right. And don't spend six months building something, spend six days and then iterate and iterate. And you thought you had to do 10 things, but you put it in front of your customer. You learn that they didn't want five, but they wanted two more. So you're going to have a net gain of three things that you didn't have to do. And the customer is going to be happy. That's like, to me, agile in a nutshell. But the problem with data is you've got another cycle. In addition to the thing that you're giving in front of your customer, you've got the data cycle because the data may not
Starting point is 00:26:39 support what your customer wants. You've got to learn, test, probe, model, experiment on the data. And so you've got these two cycles that are going on, the application cycle of does it make sense? And you can see that as charts and graphs or dashboards or however you want to express that. But you've also got the data and the learning from the data cycle. And both those things, I think, are better done in an iterative experimental way and they have to be coupled together. And that makes data ops more complicated. And then finally, you know, software teams have dev and ops. They're two separate teams and they're usually under the same boss. In data analytics, there's just, there's multiple development teams and multiple operations
Starting point is 00:27:19 teams. The whole idea of self-service data prep, self-service visualization, and being able to push it to production. And data science teams are often their own corner. And we tend to work with big companies and they tend to have hundreds of people doing data and analytics scattered around the organization. You could argue that 10% of a company is involved in some form of the process of dealing with data. And the best companies in my mind are companies like Netflix, who are trying to have everyone in the company, or Spotify have some ability to access the pile of data and get results out of it. And I think that is where we need to go. But
Starting point is 00:27:59 that means that everyone in the company, to a certain extent, is going to be a developer, is going to create code. And what do you do with that code? Well, it should be in Git, it should be versioned, it should be tested, it should be deployed, you run a factory in production, all those things happen, whether you happen to be a full-time, highly paid $200,000 a year professional, or you happen to be someone with a BA in business who's helping the business by doing something. This is great, Chris. Super, super interesting. Actually, when you mentioned at some point about the manifesto entry where you say that data is code, I couldn't help and it reminded me of like the exact opposite that the list programmers say. I don't know if you are aware of it,
Starting point is 00:28:41 that actually they say that code is data. Anyway, it's just something that comes from more of like a computer science kind of thing because of how the language is. But this equivalence, what I'm trying to say is that this equivalence between data and code actually is super, super important. And we also see that a lot. So I have a question about data itself. We are talking a lot about data itself. We are talking a lot about data. You mentioned like all these, let's say, value creation chain, let's say somehow, where data is moving around, it gets processed.
Starting point is 00:29:14 But what's about what data we are talking about? Data can be almost anything, right? What are the most common types of data that you see your customers are using, you have worked with, and what usually you have worked with, and what usually you have in your mind when you're talking about data ops? Yeah. So we work with companies like, for instance, big pharma companies, and they have groups that do analytics for commercial, like marketing and sales. They have groups that do, multiple groups that do data and analytics for drug discovery and like genomic data or experimental data. They have groups that look at manufacturing data, which is like production and, you know, quality metrics of the drugs they create. And they have
Starting point is 00:29:56 internal teams that look at sort of financial or HR metrics. And, or you could look at companies like financial service companies, right? And then they have, you know, they have teams that are focused on compliance and risk in addition to marketing and marketing and sales and internal systems. And so even charitable giving companies, right, who have to keep track of where their donors are and how much donations and how much the effect of their marketing campaigns. So it's varied. And, you know, there's, it is true. We're not particularly domain specific because a data and analytic team will do some of the same things, a lot of the same things invariant of what type of data it is. But the people who are most interested in data ops are areas where the quest, the amount of questions they have of the data outstrip the supply of the team able to answer it. And that's one issue. And then second is where the tolerance of the team, they want to trust the data. And so
Starting point is 00:31:07 a lot of times organizations don't end up trusting the data. And there's a lot of reasons for that. But those are the two things that we look for and sort of our prospects is like the data team's not keeping up and they're having just a lot of problems in the assembly line of getting data out. And then the third is that they kind of realize that that's a problem that they should fix because that's not always the case, right? Sometimes that is the sort of hair shirt or they think that they have to live with that status quo, that they're always going to be, I have too many, their backpack is always going to be filled with requests from their customers. And they got to wear it like St. John the Baptist wore a hair shirt and say, like, we got to suffer.
Starting point is 00:31:49 And our lot in life is to suffer. And like, I just, you know, I just don't think that that's, you have to live that way. And I find it that I was not happy personally living that way and suffering under late nights and deadlines and kind of not feeling great that I couldn't satisfy my customer because they always had 10 follow-up questions and we couldn't answer them. And so the solution to that is not to like look for the new magic tech widget. And I'm an engineer, right? I love it. And so the solution is to sort of rethink how you and your team work. And that fundamentally is a leadership question. And so how do you lead a team to do that?
Starting point is 00:32:31 Do you think that our data ops different when we're talking about data analytics or business intelligence and when we want to do some work with machine learning or the same principles apply in both use cases? Well, there's, you know, certainly putting ops on the end of a noun is fashionable now. So there is model ops and ML ops. And that's, to me, the idea of data ops is applied to machine learning. There's data gov ops, which is a new one. And we actually helped coin it, which is the application of data governance principles and ops sort of like governance is code. And, you know, I think there are specifics in each domain that are unique to whether you're talking about managing a data catalog and the deployment of changes to a data catalog from production, whether you're actually doing data management or doing data science.
Starting point is 00:33:20 Like actually there are techniques, specific techniques to monitor compliance of a model. And there's specific techniques to look at how to understand changes in data. And so I think all parts of the data and analytic pipeline have an ops thinking. And I tend to bundle those all under the term data ops, but the market sometimes refers to them differently, saying the data ops refers just to the data warehouse data portion. And the data ops refers to the data portion. Model ops refers to the model portion. And the analysts haven't really named the portion that helps to do with self-service analytics yet because self-service ops is too awkward. Maybe BI ops.
Starting point is 00:34:03 Yeah, yeah, makes sense. All right, last question from me, and then I'll hand the microphone to Eric. maybe AIOps. Yeah, yeah, makes sense. All right, last question from me, and then I'll hand the microphone to Eric. So we talked a lot about DataOps. How is DataKitchen actually help with that? And how do you build a product or a service around DataOps? How did you do it?
Starting point is 00:34:22 So yeah, so we're a product company. So we have a software product that helps you solve those problems, right? Helps your team deliver more things to your customer. So you're not burdened, helps you do use your current tools to deliver it with less errors and helps you not sort of end up in the Hatfields and McCoys of, you know, your data science team and your BI team are at each other's throats. And so we do that through software. We also have some services around it. What we found lately is that our thought leadership is valued and bigger companies are looking for us to help them with their transformation to do data ops. So big companies will set up a internal team under the CDO saying, okay, we have to, you know, they believe,
Starting point is 00:35:02 the leadership believes that the core problem isn't going to be solved by buying yet another tool, that they really have to rethink that they're being agile. And maybe the CDO is talking to the CIO and the CIO says, yeah, we've gone through the agile DevOps transformation. And, or maybe, you know, a data engineer is sitting next to someone who works on, you know, who works on their company's website. And at lunch, that person hits a button and deploys new code to production. And the data engineer goes, yeah, that takes me three months to do. And we take, you know, 300 meetings. And so the idea is agility in our organization and as a business concept comes from the leadership
Starting point is 00:35:41 down, I think. It shouldn't, but at least right now, that's who we target. And as we grow, we're going to work more towards having the individual contributor who wants to help their organization move to DataOps just by themselves.
Starting point is 00:35:57 But yeah, right now, we have to pay people who work for us like to get paid. And so we have to economically find ways to sell things to people so they can get paid so they can build software. Chris, it's so interesting. One thing that we've seen on the show over and over is that when it comes to running a data-driven organization and you start to ask people, okay, what does it really take
Starting point is 00:36:23 to do that? They've never mentioned the tools, right? They say you really need, I mean, you hit the nail on the head. You really need the initiative to come from leadership, and then you need alignment across teams. And these are things that I think that many people know and to some extent are intuitive, How do you approach that problem? Because that's a pretty interesting breadth of problems to solve across an organization. And part of the problem is organizational itself. And software can't solve that. Yeah, yeah, I hear you. And maybe, I don't know, you know, I've had a career and I want to,
Starting point is 00:37:24 this is a problem that I know exists exists and I know it should be solved. And my fellow nerds are suffering like I'm, like I suffered and I want it to be solved for them. And maybe that sounds goofy, but it is how I feel. And so it is a big problem, right? Because we're saying that you should rethink how all those people on your data and analytic teams work. And that's fundamentally an upstream problem. And so let me give you a metaphor to explain that. So imagine that you two are sitting by a river and on that river, your nice summer day, you see some kids in that river sort of drowning and are struggling and you
Starting point is 00:38:04 kind of swim in and grab them and pull them out. And you're like, what's going on? And then you're sitting on the bank again and some more kids come by and some more kids come by and you suddenly are always sort of pulling the kids out of the river and they're always sort of like drowning. And you're like, man, this is, and someone comes along and offers you a way to get faster from the shore to the kid, you're going to go, ah, that's the right thing to do. I got to get the thing that moves me faster to shore so I can rescue these kids faster. And one of you gets up and starts to walk away. And you're like, the other one says, what are you doing? Why are you walking away? And he says, you know what, I'm going to go upstream and tell the kids to stop getting in the river. And so that's the kind of problem it is. It's, you know, a lot of
Starting point is 00:38:50 solutions are about get faster to the drowning kids. And I'm saying, no, the real problem is you got to walk upstream and stop the kids from getting in the river in the first place. Sure. Absolutely. What a great analogy. What a great analogy. Do you, you know, you started the company in 2013. Is it easier to have that conversation now because of the proliferation of happened is the amount of knowledge of the techniques that have applied to data, whether it's NLP or AI or ML or big data or, you know, Spark or Hadoop or people have sort of digested those ideas, if you will. The market is, there's blogs. And if you want to find out, there's just a lot more information to learn. And it's been formalized a bit. So, for instance, there's a bunch of master's degree programs in data science and analytics that didn't exist. And so it's much easier, for instance, to find people who have been academically trained in the field than it was 10 years ago. And there was like no one who was academically trained in the field 10 years ago. And so what that means is people are aware of all the things that you can do, right? And what that means now is that they have a lot more ability to do things. And they're seeing the problem clearer. Because before, when you're like, I'm only on
Starting point is 00:40:19 second base, and you're like, wow, third base is like machine learning and AI. I want to get there, man. That's the cool. The home run is AI. I want to go there. And people are running really fast, but then they run around the base. They find out that they're still losing the game. They're like, hmm, you know, what is it? Is it AI? Is it ML? What's the thing that's going to make this work? And I think that those are all parts of helping you deliver insight, but really you need to build a system that helps you deliver insight. It's about sort of how you work and not what you do and that what could be AI or ML or visualization or data or whatever that you do.
Starting point is 00:40:58 And so to me, it's that I think what's been helpful in the change is that more people are actually doing it and seeing the problem. Absolutely. You're a master of analogies, which is great. I don't know, I'm making some up. The baseball one was a little mixed metaphor there, so I'm not sure that that was perfect. Hey, it worked for me. That was great. Well, let's talk about the future. We've talked about the past and what's led to today and the way that companies are solving problems and that Data Kitchen supports them in solving problems around data. But let's just continue with a baseball analogy. What inning are we in, in terms of data and specifically the software that supports data
Starting point is 00:41:40 driven organizations? Oh, so specifically, what inning are we in for, for data and the transformation of companies to be data-driven? I think we're probably like in the second inning, you know, maybe third, it's still early in how companies are going to transform to, to run on data and the idea of data ops and the set of ideas behind it. We're kind of like, you know, we're in the first part of the first inning in some ways, it's still quite early. More people are interested in it, but it's still quite early. And so I think the data and analytics industry is, and I've been, you know, I'm fortunate that I've been able to watch the software industry grow. And I think we're still early and it's a
Starting point is 00:42:19 cool industry in a lot of ways. It's a lot more diverse. The problems are a lot more interesting. And so I'm still bullish that there's a lot of companies and a lot of good. It's a lot more diverse. The problems are a lot more interesting. And so I'm still bullish that there's a lot of companies and a lot of good we can do by helping people to be data-driven. And we can also deal with the negative effects of being data-driven that we've seen in lots of places from the biases and predictive models to the sort of privacy problems that come up with data. And I think all those things are good in sign of a maturing industry.
Starting point is 00:42:50 Sure. One thing that's interesting and would love your perspective on this. So I think that there, to some extent, for people who work in the technology industry, specifically sort of in and around Silicon Valley, geographic or not, but sort of the ethos of high tech and software, is that leading indicators of a decade-long trend often
Starting point is 00:43:17 show up in pretty big ways. So two things that come to mind when we think about the world of data. So one would be the acquisition of Looker, right? I mean, that was a really big deal and Tableau, right? So you sort of have these significant acquisitions happening in the BI space. And you can kind of get this sense, especially if you've been in and around data and analytics for hours, like, okay, this is mainstream, right? So like self-service BI is mainstream. The other one would obviously be Snowflake, you know, which is sort of like, okay, well, warehouse, you know, data unified in a warehouse, this is mainstream, right? Snowflake went public, it was massive. And in reality, the long tail of the market is way bigger than the penetration that any of those
Starting point is 00:44:08 companies have achieved and there are so many companies that just simply aren't operating on that paradigm and so i just love your perspective i mean in many ways in sort of the silicon valley ethos like that is the standard way of doing things. And a lot of companies are very forward thinking, but there's, I mean, huge percentages of the market that, you know, just aren't even, aren't even there yet. And would love your perspective on that. Oh yeah. And I have this really whacked perspective on it. So hopefully you're not going to laugh. So I actually think that we've reached peak tool in data and analytics and Snowflake is the example of it. And there, you know, I think, and peak tool for a person to use. And why do I say that? So I look at the evolution of tools
Starting point is 00:44:52 for software people. And at some point in the early sort of 98, 99, the pinnacle of cool tools was a thing called an app server. And there were dozens of app server companies. There was one called BEA WebLogic worth billions of dollars. There were other companies and it was a tool that people used. And you know what? It turned out that those tools got commoditized. And the things that actually have value now in software are the tools that make the group of people who do software better. And so you look at the acquisition of GitHub by Microsoft. It was a significant number.
Starting point is 00:45:33 And so the value has changed. And so, for instance, a great tool that a lot of software developers do is PyCharm. It's like an IDE developed by a European company. And they're like, you know, hundreds of, you know, 1000 people, and they're completely bootstrapped. And, you know, I would argue that they have more people developing within, you know, it's PyCharm. And I forget the name of the parent company who does it, perhaps, perhaps, you know, but they've got, there are probably more people using that tool than Tableau. And yet Tableau probably sold for $15 billion.
Starting point is 00:46:09 And so I think the market's going to change because people are going to realize that the value in analytic team working is getting the team working is getting in the ops side of it as opposed to what you do. And so I think, and it's just going to happen like in, it happened in software. The individual contributor tools are going to get commoditized and going to be worth less. And the things that make the team work are where the value is going to be created. And so to me, that's a long game. And I'm one of the few people who perhaps expressed that opinion that we've reached peak individual tool. And the fact that Snowflake is worth 200 times revenue, probably the people at Snowflake are laughing all the way to the bank at me right now. You know, that's, I saw it happen in software and it's going to happen and not tomorrow, but it's going to
Starting point is 00:46:53 happen in data and analytics. All right. If anyone from Snowflake is listening to the show, please email us. And we'd love to have you on to respond to that. And one advice, sell all your shares, man. Now and pay your taxes. Okay. Chris, now we need a legal disclaimer because, well, I guess we didn't give financial advice. Okay. Yeah.
Starting point is 00:47:16 I just claim that. It's men in humor. No, it's great. I think JetBrains is the company that- Thank you. JetBrains. Yeah. Yeah. That makes PyCharm. You know, it's great. I think JetBrains is the company that... Thank you, JetBrains. Yeah. Yeah, that makes PyCharm. You know, it's interesting, you know, DBT is a really
Starting point is 00:47:29 interesting example of that, you know, where there's sort of a lot of usage and just sort of a groundswell of activity that sort of resulted in like a pretty big valuation that they haven't sold. It makes the relationship between teams much easier. Yeah. And I think that's a case in point, right? Like who would have, who would have thought that Jinja templated SQL would, you could build a good company out of it. Right. And like, I actually thought it was like an anti-pattern five years ago that you didn't want to Jinja template all your SQL. And so the fact that it's gotten so popular is fantastic. And it actually goes to show you that like having, because a lot of analysts are using SQL and the fact that it's stored in Git, the fact that it's actually has common components that you can share and reuse actually is very helpful to people.
Starting point is 00:48:18 Sure. And so I think that that's a case in point about why the system, getting a tool that helps the system. And of course, it helps an individual be productive. But, you know, if you're, you know, and I just think it's really interesting because I was sort of writing the code of our product back then. And I was Jinja templating SQL. And I thought that was wrong. And I don't know. I'm really, I think I've talked to them, their CEO once. I think it's just really cool that that became, I find things that I thought were wrong that
Starting point is 00:48:46 became right as actually a really good indicator that it's a success. Isn't that funny how that works? I mean, that is kind of an interesting thing in general where I think about the conversations around the beginnings of Twitch and then pitching investors and people just saying, this is like the dumbest idea I've ever heard of. And you realize like, no, it actually was, it wasn't. Well, they did it right too. You know, they bootstrapped from, they self-funded and they got, they got traction on their open source tool and they got investment. And so I think that's, you know, I think finding a way to support yourself and your team and having, you know, I'm a sort of an anti-blitz scaler.
Starting point is 00:49:29 I believe time actually really helps you. And so, you know, getting funding is not, in my mind, you know, unless you're really sure that you need to blitz scale, which is certainly an honest thing that you should do, getting funding isn't the right thing. And certainly if you've got technical skills, you can sell your technical skills and build your product at the same time, which is kind of what we did. And so it's not actually that hard to financially make a company go. Yeah, it is interesting. I mean, it is really neat to, and that's probably a whole other episode just around the different ways that some of these tools that have become really big successes found their beginnings because not all of them are sort of your traditional venture-backed effort. One other note on dbt, just thinking about patterns that we've seen, and I'm thinking
Starting point is 00:50:15 about a lot of customers that I've worked with in customer success at Rutterstack and just thinking about the ways that they're using different tooling. We have sort of the benefit of seeing all the infrastructure and tooling that surrounds their data pipelines. And one really interesting thing, thinking about DBT that I haven't thought a ton about until this conversation is that the companies who are really, the companies running Looker who are heavy DBT users seem to get a huge amount of value out of Looker because of the underlying work on dbt, which I think really reinforces your point of it's an, it, in and of itself, it certainly like solves problems, but it actually is a big enabler of teams that are separate from the people
Starting point is 00:50:59 actually using dbt, which has been a very interesting dynamic to see because some people have the you know mindset of like well i have looker i have look ml i don't necessarily need dbt but dbt can be a huge enabler it's just really interesting to see that dynamic yeah yeah and i think both of both of them are common in that they they think of they the tools express code that is human readable human understandable editable and diff and mergeable, right? And so that you can put it in Git and you can actually use it. Whereas another generation of ETL tools, they tend to have these XML blobs or confusing JSON syntax or their binary. And so, you know, if you really think analytics is code, like we wrote in the manifesto, well, it should be treated
Starting point is 00:51:41 as code. It should be in Git and you should be able to diff and merge it. And so I think that's a great way for teams to, because there are people in analytic teams who are better at thinking in abstract terms and looking at SQL and looking at code or even templated SQL. And there are some people who just want to have a UI and so to do it. And so there's another company who's another ELT company called Matillion, who has a very visual tool that compiles a SQL behind the scenes for you. And they're just as successful as DBT because they make it work. And so I think it's interesting dynamic between the sort of tools that are closer to code and the tools that are different to code and where things are going to come out. And in some ways, the market is almost splitting. There's a lot of sort of low
Starting point is 00:52:29 code, no code, you know, self-service tools out there that you can do data prep, data science. And then there's tools that intentionally want you to code and do whether it's a Jupyter notebook or, you know, whether you're doing a DBT model or messing around with your look ml and so i think you know probably i'm more on the things that produce code because i like i think code is a much better way even look ml files are much more compact way to understand what's happening in a system however the visual uis are certainly possible are certainly popular and so in some ways the analytic industry is kind of like breaking into camps. But at the end of the day, whether it's the low code or code-ish tools, it's still code. It's still got to be versioned and stored and tested and deployed just because
Starting point is 00:53:15 it just runs in a different engine. Yeah. Well, we're getting close to time here and I have so many more questions to ask, but I love that you are sort of the anti-pattern, the anti-pattern voice. What tools are really exciting to you, you know, that may not be huge successes yet, but that you think are sort of expressive of the future that you see happening? Well, I think there's, you know, one of the things that we try to do in creating a category is to find what the category is. And so I think there's a bunch of companies who have started around automated testing and production or observability that are exciting. And of course, our product does that. There's a bunch of companies that over the past three or four years that does model deployments.
Starting point is 00:54:01 And we have capabilities in our product there. And there's other companies that do automated data governance or data governance as code. And we work with it, but we don't do that. And I think the idea of thinking of things as putting the as code on end and thinking of them applying DevOps ideas. And there's a whole sort of movement, I think, or set of ideas that come from software that are playing out into the data and analytics industry. And MLOps, DataOps, DataGovOps, data observability are one of them. And the final one that I actually think is also interesting is the idea of a data mesh, which is really the application of domain-driven design into data and analytics. And so I think there's actually a ripe way to think about how software has dealt with complexity and the ways and methods and how those play out
Starting point is 00:54:51 into systems that are built with data that I find just incredibly interesting. And of course, that's what we did. That's the purpose of our company. We just sort of stole the DevOps ideas and say, hey, move them over here. They're really good. Hmm, sure. Well, Chris, I'm sad that we're out of time because I have a ton more questions. I'm sure
Starting point is 00:55:11 Costas has a ton more questions, but that just means that we need to have you on the show in the future, which we now have proven that we actually can do. So we usually say, hey, let's catch up in six months and see how things are going. And we had our very first podcast guest on recently from six months ago when we started the show. So I know that we'll talk again and I'll be interested to see which companies sort of get acquired or IPO in that time that we can talk about and sort of validate our anti-pattern hypotheses. But thanks so much for joining us and thanks so much for the insights. It's been really wonderful. All right. Yeah. Thank you for the opportunity and you guys have a good rest of your day. As always a great conversation. This is so specific, but since
Starting point is 00:55:54 we try to limit ourselves to one, one or two things, I wanted to spend so much more time hearing about doing machine learning, you know, 15 years ago at NASA, trying to do air traffic control support. I mean, that was just amazing. And it's actually really, really interesting to me that they had to scale the recommendations from the model back and they got better results, giving a little bit more control to the humans. But that's probably a whole nother episode. So that was my big takeaway and what I'll be thinking about, which I know is a very small part of the conversation.
Starting point is 00:56:31 So Kostas, hopefully you have a takeaway that's more relevant to the data conversation. Yeah, although I think your takeaway is also like quite important to be honest. And it's not the first time that you hear something similar, right? I think it's a common trend that it's coming to the surface with all our conversations that we have, especially with
Starting point is 00:56:49 people who are in the middle, that the future is not black and white, like humans or AI, right? The future is going to be built by the synergies between machine learning, AI, and humans. And that's something that's, I mean, it was clear also like 15 years ago. I think that's the takeaway from Greece. There are a couple of other things that I really enjoyed in our conversation with him. First of all, okay, it was amazing to hear about DataOps and make it clear what DataOps is. And I think our audience is going to find this like very interesting. I really enjoyed the part of the conversation around marketing and education.
Starting point is 00:57:28 That's super interesting. I think we should discuss with more people from tech marketing and especially data-related companies and see how they market and how important education is. And the last part is how important collaboration is when we work with data at the end. What Chris was saying is that, yeah, I mean, technologies will get commoditized. The most important technology that we have to build is technology that will help all the people who need to work over the data to work better together.
Starting point is 00:57:58 So, yeah, those are my takeaways. Well, as always, a great conversation. Definitely subscribe on your favorite podcast network in order to get notified of new episodes weekly. And we will catch you next time on the show. The Data Stack Show is brought to you by Rudderstack, the complete customer data pipeline solution. Learn more at Rudderstack.com. you

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.