The Data Stack Show - 32: Cooking with Data Ops with Chris Bergh from DataKitchen
Episode Date: April 7, 2021On this week's episode of The Data Stack Show, Eric and Kostas talk with Chris Bergh, the CEO and head chef at Data Kitchen. DataKitchen’s mission is to provide the software, service, and knowledge ...that makes it possible for every data and analytics team to realize their full potential with DataOps.Highlights from this week's episode include: Chris' background and how the lessons learned in the Peace Corps and at NASA apply to him today (2:03)Why AI left Chris feeling like a jilted lover (7:49)Most projects that people do in data analytics fail (10:12)Three things that DataOps focuses on (16:37)Comparing and contrasting DevOps and DataOps (22:30)The types of data that DataKitchen handles and building a product or a service around DataOps (29:29)Fixing problems at the source instead of just offering a tool to slightly improve things downstream (37:17)Where we are at in the process of how companies are going to run on data (41:43)The Data Stack Show is a weekly podcast powered by RudderStack. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
The Data Stack Show is brought to you by Rudderstack, the complete customer data pipeline solution.
Thanks for joining the show today.
Welcome back to the show.
We have a really interesting guest today, Chris Berg from Data Kitchen, really interesting company,
a bootstrap company in the data ops space. And that's a category that I think Chris
has really been working hard to define. So I think one thing that I'm really interested in
is what his perspective on what data ops is. It makes intuitive sense, I think,
for anyone who works in and around data.
But you traditionally hear DevOps,
marketing ops, sales ops, biz ops,
and DataOps is kind of a new term.
So I'm really interested to hear
what he has to say about DataOps.
Costas.
I think we are pretty aligned, Eric.
Actually, that's one of the two main questions that I have.
Like, I'm also interested to see exactly like how DataOps is defined.
That's one thing.
The other thing is Chris is, I mean, any person who is trying to define a new category, it's
by definition a visionary, right?
So he's been also like quite a while in the industry.
And I'm really eager to hear his prediction about where the market and the
industry is going around data. Great. Well, let's jump in and chat with Chris.
Chris, so great to have you on the show. We can't wait to hear more about Data Kitchen
and the success that you've had. Thank you for joining us.
Oh, I'm really happy to be here. Thank you for the opportunity to talk about DataOps. Yes, absolutely. Well, before we get into the data stuff, which we always do because we have to,
you know, stay honest to the name of the show, I'd love to just hear a little bit about your
background and how did you, you know, what was your pathway into building a company that works
in the DataOps space? Yeah, well, I have a technical background.
So a big part of my career was sort of building software systems at places like MIT and NASA and
Microsoft and a bunch of startups. And I got the bright idea in 2005 that I should go do
data and analytics. And actually, my kids were small, and I thought it would be easy.
I'm a big software guy guy and maybe this data stuff,
it's for lesser beings. And you know what? It was actually really hard. I managed teams that did
data, what we call ETL. I had data scientists, people who did data is all working for me.
And as a leader, it just, it kind of sucked. Things were breaking left and right. And
I never could go fast enough for my customers. And there's nothing
quite as fun as explaining to a head of sales who have 5,000 pharma sales reps under them,
why the data is wrong and why you screwed up to really kind of want to change your perspective.
And then, you know, I, I'm an engineer, but training, I like to innovate and I hired a
whole bunch of smart people and, you know, some liked R and some
like Python and some like click and some like Tableau and some like writing SQL and some like
doing visual tools and, but they all wanted to innovate and try something new. And so my life
for many years was how do you go fast and innovate and how do you not break things? And so that
perspective from a leader and a technologist perspective is, we think, is
generalizable.
And so we formed a company around it about seven years ago.
And in that time, we've grown and we've had to market and describe the concept as clearly
as we can as engineers.
So we wrote a manifesto.
We hired a writer and actually wrote a book.
And I go out and try to talk about these ideas because I think they're important. And I actually think they solve
a problem that I have and a problem that I see almost every analytic team have.
Sure. You know, one thing, one thing that I mean, Costas and I talk about data all the time
because it's the work that we do day to day, but he has such a good perspective that he always
reminds me, it doesn't really matter what you're dealing with. If you're dealing with data,
it's going to be messy. And that really is true. And I think, you know, I think you spoke to that
in terms of just saying, you know, trying to deal with all of the, all the issues around pipelines
and cleanliness and accuracy and all of that is crazy. So before we get
into dedication, one thing I like to do, which has kind of become a pattern and I always monopolize
the beginning of the conversation. So I apologize again for the 30th time, Costas, but I'm interested
to know, one thing that we love asking our guests about is how their early career has influenced what they're doing now
in data. So I know that you worked as a teacher in the Peace Corps early on, and then you also
did some work with NASA, which is really interesting. And that's actually a common
thread among our guests, you know, sort of doing work in a scientific context, but I'd love to know,
are there any lessons from those early experiences that you've carried with you
today as you work with data, even though the context may be pretty different?
Yeah, I think, well, number one, there's this cliche, all who wander are not lost.
And I think it's okay to wander. And certainly
when you're young, getting all careerist right away is I'm not sure the right thing to do.
And I learned a lot by, you know, spending two and a half years kind of teaching math and Botswana.
And I actually probably use those skills every day. And it is actually kind of what I do now.
I teach, I talk about these ideas and share them.
And those communication skills, I think are hugely important. And if you
want to advance yourself in a technical career, you quickly learn that, you know,
your technical skills are great. And it's certainly great to stay an individual contributor
and technically, but also communication and emotional skills become more. And so being
able to work across, you know, teams is a fundamentally a communication challenge.
And then second, just working at NASA was just a lot of fun. I mean, it was, I was really,
I kind of was really into AI in 1988 and 1989. Like I just loved it. Like I went to graduate
school on it. It was in that time, it was like
the winter of the winter and no one was into it. Like I took a machine learning class and it had
six people in like right now, if you go to college campuses, there are hundreds of people trying to
take an ML class. And it's, and so I sort of really was into AI for like five years. And we designed
the system to automate air traffic control and
sequence and space aircraft to kind of like into arriving at an airport in an optimal way. And it
was fun, wrote papers and got systems installed in a bunch of airports. And it was a totally cool,
a totally cool project. And it taught me a lot about how systems work and how especially
intelligent systems work. And so both those experiences, I think were just great.
So interesting. Okay. So I just have to ask a question here while I'm stealing the stage.
So AI and ML are very hot topics today. And you have a perspective that, you know, sort of
predates like it being really cool, you know, in the 2020s. Tell us, you know, in that ML class,
what are the things that are the same and what are the things that are really different? Because
I think a lot of our listeners, me included, you know, I understand the power of ML and AI,
but it's pretty new to me. And that's, you know, in part because I'm young, but the concepts
themselves are not brand new. You know, they're not only five years old. You know, I don't want
to say I'm an expert on machine learning, but some of the things that have happened in the 20 odd
years are, you know, people learn to train the middle layers of neural networks a lot better,
and it's still back propagation, but the amount of techniques that you have to train a neural
network, and especially the amount of data that you have to train a neural network has gotten a lot more, has gotten a lot of
big, has gotten a lot of bigger. And there are some other techniques that have come along, like some
of the ensemble methods that are good. But, you know, my perspective, and I don't want to lay
myself as an expert here, but like one of the reasons I left AI, and I left it almost like a
jilted lover, to be honest. And like, I loved it so much, but like one of the reasons I left AI and I left it almost like a jilted lover,
to be honest. And like, I loved it so much, but I got so frustrated with it
because it got asymptotically hard to actually do anything. Like, and so we were trying to get,
think of it like self-driving cars, how everyone says they're going to happen. But if you've
watched the latest Tesla video sort of swerve and almost hit pedestrians. It's scary, right? And they've been at it for five years. And I felt the same about working at NASA and that it gets harder
and harder for systems to get more intelligent because they lack the sort of semantic context
that people have. And it just, you know, you can push something. We could push our accuracy of our sequencing and improve it by 1%, but that next 1% was
so hard and actually caused so many understandable problems.
And so finding the right level of automation, we actually backed off the amount of control
we gave, the amount of instructions we gave to air traffic controllers and just told them
less and we got better results.
And so the synthesis of
intelligence with people was really important in my mind. And so when I got, when AI got really
hot again, I was sort of felt like it's, it's that girlfriend I had in high school.
And suddenly she was like a movie star. I'm like, what, what happened?
Like, oh, I remember you. No, well, she didn't remember me.
That's great.
Okay, well, I'm going to ask one more question and then hand the microphone to Costas.
But thank you for entertaining me.
I just always love to ask some sort of
quasi-personal questions.
But tell us about Data Kitchen, your company.
You've been around for a while.
You've seen the data revolution come about,
you know, just in terms of technology and
process.
So just tell us about Data Kitchen and what you do.
And then Costas, I'm sure is burning with questions.
Yeah.
So, you know, when I started kind of full-time in data and analytics, I had to explain what
it was to people like, oh, what do you do?
We do charts and graphs.
And they're like, oh, that's nice.
And you talk to people at a dinner party and they'd immediately turn away as if you mentioned that you were like, you know,
a garbage man. They just had no idea what it was and it wasn't that interesting.
And so what's happened the last 15 years are things like Moneyball and the idea that data
is not just some exhaust, it actually is a generator of value. And that's a really important
idea. And in fact, just like people have come to realize that every company is a software company, people are starting to realize that every company has a data and analytics part to it. And the succession of buzzwords of AI and ML and big data have all come in different than 15 years ago, is that the process of work to build these technically
complicated systems, and whether the data is in batch or streaming or big or small,
you put that data from a lot of sources in one place, and then you do something maybe predictive
on it of some type of model, and maybe you visualize it, and maybe you govern it. And those
aspects, they're better than they were 15 years ago, but and maybe you govern it. And those aspects,
they're better than they were 15 years ago, but they're still the same. And what remains is something quite embarrassing, I think, to the industry is that most projects that people do
in data and analytics fail. There's an incredible 50, 60, one analyst at Gartner said 85% rate of
failure in analytic projects.
And that's just way too high.
And sometimes I bring it up and I feel like I'm kind of saying something embarrassing at a party, but it is really too high.
And what business works with a high failure rate?
And so what is the real cause of that?
And why do these projects fail?
And they fail whether you did the technical superstructure that you worked on, whether it was a small database or a Teradata or a big database, whether what kind of tool have and the tools they have all to work together
is the hardest thing. And it's a people and process problem. And that the people and process
that we should follow has already been discovered and people have actually kind of already figured
out how to do it. And so if you look at the way people made factories 70 years ago and started off with, you know, sort of piece
production and mass production. And finally people like Deming and the Toyota production system,
they figured out a set of principles where a bunch of, a team of people are working on a shared
technically complicated thing, an assembly line. And so they talked about things like safety
culture or theory of constraints, just in time or total quality management. And so they talked about things like safety culture or theory of constraints,
just-in-time or total quality management. And then those are actually accepted now, right? And you
don't want to run a factory without those ideas. You don't want to run it in some Taylorist way,
and you want to make Toyotas and not AMC Pacer cars, which are crappy cars from the 80s near
where I grew up in Wisconsin. And those ideas sort of started
to percolate in the software industry where the Agile Manifesto was written 20 years ago.
First DevOps conference was in 2009. And really, honestly, those ideas are kind of the same. How
do you get a bunch of people who happen to be software developers and people running software
systems to work together on this shared technically complicated thing, the big piece of software, IT thing. And so what occurred to me when I was sort
of suffering in 2005 and 2008, I started to read about Deming and manufacturing. And I was like,
wow, these ideas really apply. And then of course I was steeped in software and agile and DevOps.
And so we started to apply those ideas in that realm
and early. So we started to do a lot of test automation. We did groups around trying to look
for errors and quality. We tried to apply some of these ideas and tried to change the culture to
make it, you know, people love their errors and not have shame. And so we built a version and then a second version
of a system that did that. And then we sold that company because the system also had a BI system
and other stuff encased in it. And when we started Data Kitchen, we realized that we just could not
work in any other way. And we were sort of committed bootstrappers and we were doing
some work for customers. And then we realized that this way we work is the way everyone should work.
And we're not just nerds to think that
we actually have a better idea than other people.
And this is an idea that, like I said,
people have had in other industries
and we just needed a way to talk about it.
And so we spent literally years trying to figure out
how to describe what was in our head.
Like we called it agile analytic operations.
We called it DevOps for data science. We called it analytic ops for a while, but then if you shorten it down,
it makes a really terrible shortened phrase. And so we, about four or five years ago,
we called it data ops. And then we wrote a manifesto, the book, and we've been trying
to talk about the concept since, and they're all based in just our experience.
This is great chris actually you
said something a bit earlier that i really loved you talked about education and how being in the
education helped you and that actually you're still an educator and i love that i think it's
super important especially with technology actually many times when i'm talking about
marketing in tech what i'm saying is that's the function marketing should have in tech, to educate people, right?
And I think you were pretty early in many new, let's say, technology trends in the market.
And I'm pretty sure that you are experiencing this also with Data Kitchen right now and DataOps. It's almost mandatory to be able to educate the people out there about the ideas that
you have, how the product changed their lives, and how it can be used.
And I'd love for our show here to be a channel for this kind of education.
So let's focus a little bit more on DataOps.
You mentioned a little bit about how you came about the name, how your experience shaped your decision to create something around DataOps.
What's, let's say, one sentence, two sentences, like definition of DataOps?
How you could describe that?
Well, I think it's a set of technical practices and cultural norms for data and analytic teams to focus on really three main things. One is being
able to iterate quickly from the ideas in their head and get it into the production so they can
get feedback from their customers and learn. The second is to run their factories with very,
very low errors of any cost. And then third is to deal with the fact that your data and analytic teams
are teams and not just one team, but many team. And so how does your data science team relate to
your centralized data team, to your self-service teams, to your data governance teams? And so it's
about focusing on cycle time, error rates, and collaboration. And all those
things end up, if you get those right, you actually end up being able to produce a lot more insight
for your customer. And you end up being able to have a lot more customer trust. And your team is
actually happier and more productive. Makes total sense. So I have some more questions around that. But before we go there,
you mentioned the DataOps manifesto. And I've seen that before with the Agile manifesto,
for example. What made you go after something like this? Why it was important to come up with
a manifesto for DataOps? So we went to our first conference and we wore chef's hats and gave out
wooden spoons and people just thought we were freaking aliens. They had no idea what the term
DataOps is. They had no idea what DevOps was. They just, are you an ETL tool? What are you guys? And
we had paid all this money and it was just embarrassing. And we sort of realized that,
wow, we've got to go back to the beginning. And so we wrote the Wikipedia article after that. And we wrote the manifesto, got some feedback from other people.
And then we realized that we had to write about it in a very clear way. So we were always going
to conferences and discussing, but we had to. And so the ideas, the expression of the ideas
was actually really important. And kind of from a business, we felt like we were creating a software category.
And so we had to do the work. And, you know, I think, you know, thank goodness, we never got
any funding, right? Because it just took a while. And it's still taking a while. Because, you know,
we're sort of the anti lean startup. It's like, you know, we're sort of stuck on this idea,
we know it's right, we got to find the right business to make it happen. But the education part is
really what surprised me. And we've had over 10,000 people sign. We've had 15,000 people read
the book. And I've literally had dozens of people who've read our book and then have gone off to
influence their organization and to follow the DataOps principles. And I find that really interesting and really exciting that ideas can change. And I think that's all it is. And that's what's cool about
what I really like about technical people is we just love learning and we love ideas and we want
to try some stuff. And it's a good idea and you should try it. Absolutely. So how do you write a
manifesto? It sounds like something very revolutionary, let's say.
What's the process?
I mean, you mentioned that it takes time, iterations.
You are the first person that I have met
who has been involved in the manifesto, to be honest.
So I'm really interested to learn more about it.
Well, we stole, literally.
So we started in one of our conference talks,
we had taken the Agile manifesto and removed the word software and put in data and analytics.
And that actually kind of made sense, but it was sort of wrong.
And then we like took it and put it into a Word document and started my co-founder and I started mailing it back and forth.
And a fellow who worked for me also was involved.
And then we just, you know, we, we tried to make it happen. And so there's, I looked at some things in DevOps, some things in lean, and we sort of, you know, it,
when you live the pain for seven or eight years, or you're continually living the pain,
because at that time I was, we had, we had built a sort of a early version of our product. And I
was actually also functioning as a data engineer day to day for a small company. And so we were using our
product and I was doing the data work for a small pharma company. And so I was literally writing
code and feeling the pain myself. And so it's been surprising that it's, I just thought it would
be kind of silly and we tried to get other people to join in and some people did and gave some
feedback, but it was mainly sort of us as a company putting it up. And yeah, I mean,
it's marketing, right? So it could just be bullshit, but on the other, excuse me, can I say
that online? So maybe you can keep it out, but, but it also is, it's an expression of really how
we think those 18 points are really what we've learned. And so they're true for us.
Yeah. I mean, it's not wrong if something is marketing.
As we said, like marketing is also education.
It's also communication.
So, and it seems like it's a great tool to communicate ideas
and especially in a very early stage, right?
Because, okay, as you said, DataOps was very new term.
Like you need to communicate that.
You need to create a consensus over what this term means.
And you have to establish this communication.
And I guess having a manifesto and going back and forth
and like talking with the community and agreeing on that,
I think it's a great way to do it
and create a new category, which is great.
Quick question about,
get a little bit more technical
around like the concept of data ops.
I hear you all this time that we're discussing
and you are mentioning about agile, DevOps.
And my understanding is that there are techniques, best practices, methodologies, maybe also
technologies that are related with these disciplines that are borrowed for DataOps.
And correct me if I'm wrong.
So can you share with us what from each one of the disciplines that affect data ops or
inspire data ops are the most important?
Yeah, and I think the first one is testing or monitoring or some companies who started
in the last year calling it observability.
And so that goes to when you have data in production, you're in the squeeze, right?
The data is coming in and you don't know if it's
good or not. And therefore you don't know if your result's good or not. And so you want to make sure
that the data is tested and monitored and correct before your customer sees it. You don't want to
get that call on Friday afternoon saying the data's wrong. And then you're spending all night
Friday trying to fix it, or you're leaving the soccer game on Saturday and your wife's giving
you dirty looks because you just got an email, something's wrong. And by the way,
these things have happened to me and people I work for, and it's not fun. And I think you should
build a system that you know that if it's right, and it tells you if it's right. And to do that,
you've got to go in and grab bits of data, look at them, compare them to previous versions. You've
got to test the size and shape. You've got to look at the artifacts, the models, the visualizations to make sure that they're all right. Because we have,
if you take the manufacturing analogy a little further, the workstations in the assembly line
are the tools that we use to do the work. And so there's a class of tools to do data work called
ETL or ELT or data prep. There's a class of tools to apply models. There's a class of tools to
visualize. There's a class of tools to govern. All those are sort of workstations that you use,
and data is passing along the assembly line on those workstations. And it doesn't matter if it's
big or small or streaming or batch, you're still having a tool and that tool is governed by code.
And that code has complexity to it, just like software systems
do. So the first thing is that you run a factory and that's similar, but not quite as similar to
software systems. The second is more similar is that it's analytics is code. That's one of the
lines in the manifesto and your ETL tool may produce an XML file, but that is code equivalent in my mind because it runs in an engine.
Your Viz tool may produce a visualization that's an XML, but that runs in an engine.
You may have SQL code or Python code that's literally code.
And there's some tools like that are produced YAML files, which are very close to code or JSON files.
And so you have a code governed system. Right. And so code means complexity.
And so when you're doing data analytics, you're in the complexity business. And software actually
has been in the complexity business for years. That's what it is, is how to deal with all this.
And one way that software teams deal with complexity is to have a path to production
that is automated. And so one aspect of that path to production
is that they have a development environment
where you can test things
and find out if you've broken anything.
And so you can change something in the middle
and see the effects downstream of it and in development.
And I think that's an incredibly powerful concept. And a lot
of data and analytics teams, most of them, A, aren't testing in development or they're doing
it manually and they don't judge the sort of small effects. And so they end up building processes
like meetings and technology review boards. And so the other process that software has done,
in addition to complexity, in addition to testing, is automating the deployment of things from a development environment to production, making that smooth and fast and automated.
And so DevOps is kind of some of the same ideas are there, but DataOps is different. And fundamentally at a high level, like if you boil it all away, Agile says the thing that you're building, get it in front of your customer quickly and change it because you don't really know what they want.
Right. And don't spend six months building something, spend six days and then iterate and iterate.
And you thought you had to do 10 things, but you put it in front of your customer.
You learn that they didn't want five, but they wanted two more.
So you're going to have a net gain of three
things that you didn't have to do. And the customer is going to be happy. That's like, to me,
agile in a nutshell. But the problem with data is you've got another cycle. In addition to the
thing that you're giving in front of your customer, you've got the data cycle because the data may not
support what your customer wants. You've got to learn, test, probe, model, experiment on the data. And so you've got
these two cycles that are going on, the application cycle of does it make sense? And you can see that
as charts and graphs or dashboards or however you want to express that. But you've also got the data
and the learning from the data cycle. And both those things, I think, are better done in an
iterative experimental way and they have to be coupled together. And that makes data ops more complicated.
And then finally, you know, software teams have dev and ops.
They're two separate teams and they're usually under the same boss.
In data analytics, there's just, there's multiple development teams and multiple operations
teams.
The whole idea of self-service data prep, self-service visualization, and being able to push it to
production. And data science teams are often their own corner. And we tend to work with big companies
and they tend to have hundreds of people doing data and analytics scattered around the organization.
You could argue that 10% of a company is involved in some form of the process of dealing with data.
And the best companies in my mind are companies
like Netflix, who are trying to have everyone in the company, or Spotify have some ability to access
the pile of data and get results out of it. And I think that is where we need to go. But
that means that everyone in the company, to a certain extent, is going to be a developer,
is going to create code. And what do you do with that code? Well, it should be in Git,
it should be versioned, it should be tested, it should be deployed, you run a factory in
production, all those things happen, whether you happen to be a full-time, highly paid
$200,000 a year professional, or you happen to be someone with a BA in business who's
helping the business by doing something.
This is great, Chris. Super, super interesting. Actually, when you mentioned at some point about the manifesto entry where you say that data is code, I couldn't help and it reminded me of like
the exact opposite that the list programmers say. I don't know if you are aware of it,
that actually they say that code is data. Anyway, it's just something that comes from more of like a computer science kind of thing
because of how the language is.
But this equivalence, what I'm trying to say is that this equivalence between data and
code actually is super, super important.
And we also see that a lot.
So I have a question about data itself.
We are talking a lot about data itself. We are talking a lot about data. You mentioned like all these,
let's say, value creation chain, let's say somehow, where data is moving around, it gets processed.
But what's about what data we are talking about? Data can be almost anything, right? What are the
most common types of data that you see your customers are using, you have worked with,
and what usually you have worked with,
and what usually you have in your mind when you're talking about data ops?
Yeah. So we work with companies like, for instance, big pharma companies, and they have groups that do analytics for commercial, like marketing and sales. They have groups that do,
multiple groups that do data and analytics for drug discovery
and like genomic data or experimental data. They have groups that look at manufacturing data,
which is like production and, you know, quality metrics of the drugs they create. And they have
internal teams that look at sort of financial or HR metrics. And, or you could look at companies
like financial service companies, right? And then they have, you know, they have teams that are focused on compliance and risk in addition to marketing and marketing and sales and internal systems. And so even charitable giving companies, right, who have to keep track of where their
donors are and how much donations and how much the effect of their marketing campaigns. So it's
varied. And, you know, there's, it is true. We're not particularly domain specific because a data
and analytic team will do some of the same things, a lot of the same things
invariant of what type of data it is. But the people who are most interested in data ops are
areas where the quest, the amount of questions they have of the data outstrip the supply of the
team able to answer it. And that's one issue. And then second is where the tolerance of the team, they want to trust the data. And so
a lot of times organizations don't end up trusting the data. And there's a lot of reasons for that.
But those are the two things that we look for and sort of our prospects is like the data team's not
keeping up and they're having just a lot of problems in the assembly line of getting data out.
And then the third is that
they kind of realize that that's a problem that they should fix because that's not always the
case, right? Sometimes that is the sort of hair shirt or they think that they have to live with
that status quo, that they're always going to be, I have too many, their backpack is always going to
be filled with requests from their customers. And they got to wear it like St. John the Baptist wore a hair shirt and say, like, we got to suffer.
And our lot in life is to suffer. And like, I just, you know, I just don't think that that's,
you have to live that way. And I find it that I was not happy personally living that way and
suffering under late nights and deadlines and kind of not feeling great that I couldn't
satisfy my customer because they always had 10 follow-up questions and we couldn't answer them.
And so the solution to that is not to like look for the new magic tech widget. And I'm an engineer,
right? I love it. And so the solution is to sort of rethink how you and your team work.
And that fundamentally is a leadership question.
And so how do you lead a team to do that?
Do you think that our data ops different when we're talking about data analytics or business intelligence and when we want to do some work with machine learning or the same principles apply in both use cases?
Well, there's, you know, certainly putting ops on the end of a noun is fashionable now.
So there is model ops and ML ops.
And that's, to me, the idea of data ops is applied to machine learning.
There's data gov ops, which is a new one.
And we actually helped coin it, which is the application of data governance principles
and ops sort of like governance is code.
And, you know, I think there are specifics in each domain that are unique to whether you're talking about managing a data catalog and the deployment of changes to a data catalog from production, whether you're actually doing data management or doing data science.
Like actually there are techniques, specific techniques to monitor compliance of a
model. And there's specific techniques to look at how to understand changes in data. And so I think
all parts of the data and analytic pipeline have an ops thinking. And I tend to bundle those all
under the term data ops, but the market sometimes refers to them differently, saying the data ops refers just to the data warehouse data portion.
And the data ops refers to the data portion.
Model ops refers to the model portion.
And the analysts haven't really named the portion that helps to do with self-service analytics yet because self-service ops is too awkward.
Maybe BI ops.
Yeah, yeah, makes sense.
All right, last question from me, and then I'll hand the microphone to Eric. maybe AIOps. Yeah, yeah, makes sense.
All right, last question from me,
and then I'll hand the microphone to Eric.
So we talked a lot about DataOps.
How is DataKitchen actually help with that?
And how do you build a product or a service around DataOps?
How did you do it?
So yeah, so we're a product company.
So we have a software product that helps you solve those problems, right? Helps your team deliver more things to your customer.
So you're not burdened, helps you do use your current tools to deliver it with less errors
and helps you not sort of end up in the Hatfields and McCoys of, you know, your data science team
and your BI team are at each other's throats. And so we do that through software. We also have
some services around it. What we found lately is that our thought leadership is valued and bigger companies
are looking for us to help them with their transformation to do data ops. So big companies
will set up a internal team under the CDO saying, okay, we have to, you know, they believe,
the leadership believes that the core problem isn't going to be solved by buying
yet another tool, that they really have to rethink that they're being agile. And maybe the CDO is
talking to the CIO and the CIO says, yeah, we've gone through the agile DevOps transformation.
And, or maybe, you know, a data engineer is sitting next to someone who works on, you know,
who works on their company's website. And at lunch, that person hits a button and deploys new code to production.
And the data engineer goes, yeah, that takes me three months to do.
And we take, you know, 300 meetings.
And so the idea is agility in our organization and as a business concept comes from the leadership
down, I think.
It shouldn't, but at least right now,
that's who we target.
And as we grow,
we're going to work more towards
having the individual contributor
who wants to help their organization
move to DataOps just by themselves.
But yeah, right now,
we have to pay people who work for us
like to get paid.
And so we have to economically find ways to sell things to people so they can get paid
so they can build software.
Chris, it's so interesting.
One thing that we've seen on the show over and over is that when it comes to running
a data-driven organization and you start to ask people, okay, what does it really take
to do that?
They've never mentioned the tools,
right? They say you really need, I mean, you hit the nail on the head. You really need the
initiative to come from leadership, and then you need alignment across teams. And these are things
that I think that many people know and to some extent are intuitive, How do you approach that
problem? Because that's a pretty interesting breadth of problems to solve across an organization. And
part of the problem is organizational itself. And software can't solve that.
Yeah, yeah, I hear you. And maybe, I don't know, you know, I've had a career and I want to,
this is a problem that I know exists exists and I know it should be solved.
And my fellow nerds are suffering like I'm, like I suffered and I want it to be solved for them.
And maybe that sounds goofy, but it is how I feel.
And so it is a big problem, right?
Because we're saying that you should rethink how all those people on your data
and analytic teams work. And that's fundamentally an upstream problem. And so let me give you a
metaphor to explain that. So imagine that you two are sitting by a river and on that river,
your nice summer day, you see some kids in that river sort of drowning and are struggling and you
kind of swim in and grab them and pull them out. And you're like, what's going on? And then you're
sitting on the bank again and some more kids come by and some more kids come by and you suddenly
are always sort of pulling the kids out of the river and they're always sort of like drowning.
And you're like, man, this is, and someone comes along and offers you a way to get faster from the shore to the kid, you're going
to go, ah, that's the right thing to do. I got to get the thing that moves me faster to shore so I
can rescue these kids faster. And one of you gets up and starts to walk away. And you're like,
the other one says, what are you doing? Why are you walking away? And he says, you know what,
I'm going to go upstream and tell the kids to stop getting in the river. And so that's the kind of problem it is. It's, you know, a lot of
solutions are about get faster to the drowning kids. And I'm saying, no, the real problem is
you got to walk upstream and stop the kids from getting in the river in the first place.
Sure. Absolutely. What a great analogy. What a great analogy. Do you, you know, you started the company in 2013. Is it easier to have that conversation now because of the proliferation of happened is the amount of knowledge of the techniques that
have applied to data, whether it's NLP or AI or ML or big data or, you know, Spark or Hadoop or
people have sort of digested those ideas, if you will. The market is, there's blogs. And if you
want to find out, there's just a lot more information to learn. And it's been formalized a bit. So, for instance, there's a bunch of master's degree programs in data science and analytics that didn't exist. And so it's much easier, for instance, to find people who have been academically trained in the field than it was 10 years ago. And there was like no one who was academically trained in the field 10 years ago. And so what that means is people are aware of all
the things that you can do, right? And what that means now is that they have a lot more ability to
do things. And they're seeing the problem clearer. Because before, when you're like, I'm only on
second base, and you're like, wow, third base is like machine learning and AI. I want to get there,
man. That's the cool. The home run is AI. I want to go there. And people are running really fast, but then they
run around the base. They find out that they're still losing the game. They're like, hmm, you
know, what is it? Is it AI? Is it ML? What's the thing that's going to make this work? And I think
that those are all parts of helping you deliver insight, but really you need to
build a system that helps you deliver insight.
It's about sort of how you work and not what you do and that what could be AI or ML or
visualization or data or whatever that you do.
And so to me, it's that I think what's been helpful in the change is that more people
are actually doing it and seeing the problem.
Absolutely. You're a master of analogies, which is great. I don't know, I'm making some up.
The baseball one was a little mixed metaphor there, so I'm not sure that that was perfect.
Hey, it worked for me. That was great. Well, let's talk about the future. We've talked about the past and what's led to today and the way that companies are solving problems and that Data Kitchen supports
them in solving problems around data.
But let's just continue with a baseball analogy.
What inning are we in, in terms of data and specifically the software that supports data
driven organizations?
Oh, so specifically, what inning are we in for, for data
and the transformation of companies to be data-driven? I think we're probably like in the
second inning, you know, maybe third, it's still early in how companies are going to transform to,
to run on data and the idea of data ops and the set of ideas behind it. We're kind of like,
you know, we're in the first part of the first inning in some ways, it's still quite early. More people are interested in it, but it's still
quite early. And so I think the data and analytics industry is, and I've been, you know, I'm fortunate
that I've been able to watch the software industry grow. And I think we're still early and it's a
cool industry in a lot of ways. It's a lot more diverse. The problems are a lot more interesting.
And so I'm still bullish that there's a lot of companies and a lot of good. It's a lot more diverse. The problems are a lot more interesting. And so I'm
still bullish that there's a lot of companies and a lot of good we can do by helping people to be
data-driven. And we can also deal with the negative effects of being data-driven that we've seen in
lots of places from the biases and predictive models to the sort of privacy problems that
come up with data.
And I think all those things are good
in sign of a maturing industry.
Sure.
One thing that's interesting
and would love your perspective on this.
So I think that there, to some extent,
for people who work in the technology industry,
specifically sort of in and around Silicon Valley,
geographic or not, but sort of
the ethos of high tech and software, is that leading indicators of a decade-long trend often
show up in pretty big ways. So two things that come to mind when we think about the world of data. So one would be the acquisition of Looker, right?
I mean, that was a really big deal and Tableau, right?
So you sort of have these significant acquisitions happening in the BI space.
And you can kind of get this sense, especially if you've been in and around data and analytics
for hours, like, okay, this is mainstream, right? So like self-service BI is mainstream.
The other one would obviously be Snowflake, you know, which is sort of like, okay, well, warehouse,
you know, data unified in a warehouse, this is mainstream, right? Snowflake went public,
it was massive. And in reality, the long tail of the market is way bigger than the penetration that any of those
companies have achieved and there are so many companies that just simply aren't operating on
that paradigm and so i just love your perspective i mean in many ways in sort of the silicon valley
ethos like that is the standard way of doing things. And a lot of companies are very forward thinking, but there's, I mean, huge percentages of the market that, you know, just aren't even,
aren't even there yet. And would love your perspective on that.
Oh yeah. And I have this really whacked perspective on it. So hopefully you're not
going to laugh. So I actually think that we've reached peak tool in data and analytics and
Snowflake is the example of it. And there, you know, I think,
and peak tool for a person to use. And why do I say that? So I look at the evolution of tools
for software people. And at some point in the early sort of 98, 99, the pinnacle of cool tools
was a thing called an app server. And there were dozens of app server companies. There was one called BEA WebLogic worth billions of dollars.
There were other companies and it was a tool that people used.
And you know what?
It turned out that those tools got commoditized.
And the things that actually have value now in software
are the tools that make the group of people who do software better. And so you look at the acquisition of GitHub by Microsoft.
It was a significant number.
And so the value has changed.
And so, for instance, a great tool that a lot of software developers do is PyCharm.
It's like an IDE developed by a European company.
And they're like, you know,
hundreds of, you know, 1000 people, and they're completely bootstrapped. And, you know, I would
argue that they have more people developing within, you know, it's PyCharm. And I forget the name of
the parent company who does it, perhaps, perhaps, you know, but they've got, there are probably more
people using that tool than Tableau. And yet Tableau probably sold for $15 billion.
And so I think the market's going to change because people are going to realize that the value in analytic team working is getting the team working is getting in the ops side
of it as opposed to what you do.
And so I think, and it's just going to happen like in, it happened in software.
The individual contributor tools are going to get commoditized and going to be worth less. And the things that make the team work are where the value is going
to be created. And so to me, that's a long game. And I'm one of the few people who perhaps
expressed that opinion that we've reached peak individual tool. And the fact that Snowflake is
worth 200 times revenue, probably the people at Snowflake are laughing all the way to the bank at me right now. You know,
that's, I saw it happen in software and it's going to happen and not tomorrow, but it's going to
happen in data and analytics. All right. If anyone from Snowflake is listening to the show,
please email us. And we'd love to have you on to respond to that.
And one advice, sell all your shares, man.
Now and pay your taxes.
Okay.
Chris, now we need a legal disclaimer because, well, I guess we didn't give financial advice.
Okay.
Yeah.
I just claim that.
It's men in humor.
No, it's great.
I think JetBrains is the company that-
Thank you.
JetBrains.
Yeah.
Yeah. That makes PyCharm. You know, it's great. I think JetBrains is the company that... Thank you, JetBrains. Yeah. Yeah, that makes PyCharm. You know, it's interesting, you know, DBT is a really
interesting example of that, you know, where there's sort of a lot of usage and just sort
of a groundswell of activity that sort of resulted in like a pretty big valuation that they haven't
sold. It makes the relationship between teams much easier. Yeah. And I think
that's a case in point, right? Like who would have, who would have thought that Jinja templated
SQL would, you could build a good company out of it. Right. And like, I actually thought it was
like an anti-pattern five years ago that you didn't want to Jinja template all your SQL.
And so the fact that it's gotten so popular is fantastic. And it actually goes to show you that like having, because a lot of analysts are using SQL and the fact that it's stored in Git, the fact that it's
actually has common components that you can share and reuse actually is very helpful to people.
Sure. And so I think that that's a case in point about why the system, getting a tool that helps the system.
And of course, it helps an individual be productive.
But, you know, if you're, you know, and I just think it's really interesting because I was sort of writing the code of our product back then.
And I was Jinja templating SQL.
And I thought that was wrong.
And I don't know.
I'm really, I think I've talked to them, their CEO once.
I think it's just really cool that that became, I find things that I thought were wrong that
became right as actually a really good indicator that it's a success.
Isn't that funny how that works?
I mean, that is kind of an interesting thing in general where I think about the conversations
around the beginnings of Twitch and then pitching investors and people just saying,
this is like the dumbest idea I've ever heard of. And you realize like, no, it actually was,
it wasn't. Well, they did it right too. You know, they bootstrapped from, they self-funded and they
got, they got traction on their open source tool and they got investment. And so I think that's,
you know, I think finding a way to support yourself and your team and having, you know, I'm a sort of an anti-blitz scaler.
I believe time actually really helps you.
And so, you know, getting funding is not, in my mind, you know, unless you're really sure that you need to blitz scale, which is certainly an honest thing that you should do, getting funding isn't the right thing.
And certainly if you've got technical skills, you can sell your technical skills and build your product at the same time, which is kind of what we
did. And so it's not actually that hard to financially make a company go. Yeah, it is
interesting. I mean, it is really neat to, and that's probably a whole other episode just around
the different ways that some of these tools that have become really big successes found
their beginnings because not all of them are sort of your traditional venture-backed effort.
One other note on dbt, just thinking about patterns that we've seen, and I'm thinking
about a lot of customers that I've worked with in customer success at Rutterstack and
just thinking about the ways that they're using different tooling.
We have sort of the benefit of seeing all the infrastructure and tooling that surrounds their data pipelines.
And one really interesting thing, thinking about DBT that I haven't thought a ton about until this
conversation is that the companies who are really, the companies running Looker who are heavy DBT
users seem to get a huge amount of value out of Looker because of the underlying work on dbt,
which I think really reinforces your point of it's an, it, in and of itself, it certainly
like solves problems, but it actually is a big enabler of teams that are separate from the people
actually using dbt, which has been a very interesting dynamic to see because some people have the you
know mindset of like well i have looker i have look ml i don't necessarily need dbt but dbt can
be a huge enabler it's just really interesting to see that dynamic yeah yeah and i think both of
both of them are common in that they they think of they the tools express code that is human
readable human understandable editable and diff and mergeable, right? And so
that you can put it in Git and you can actually use it. Whereas another generation of ETL tools,
they tend to have these XML blobs or confusing JSON syntax or their binary. And so, you know,
if you really think analytics is code, like we wrote in the manifesto, well, it should be treated
as code. It should be in Git and you should be able to diff and merge it. And so I think that's a great way for teams to, because there are people in
analytic teams who are better at thinking in abstract terms and looking at SQL and looking
at code or even templated SQL. And there are some people who just want to have a UI and so to do it.
And so there's another company who's another ELT company called Matillion,
who has a very visual tool that compiles a SQL behind the scenes for you. And they're just as
successful as DBT because they make it work. And so I think it's interesting dynamic between the
sort of tools that are closer to code and the tools that are different to code and where things
are going to come out. And in some ways, the market is almost splitting. There's a lot of sort of low
code, no code, you know, self-service tools out there that you can do data prep, data science.
And then there's tools that intentionally want you to code and do whether it's a Jupyter notebook or,
you know, whether you're doing a DBT model or messing around with your look ml and so i think
you know probably i'm more on the things that produce code because i like i think code is a
much better way even look ml files are much more compact way to understand what's happening in a
system however the visual uis are certainly possible are certainly popular and so in some
ways the analytic industry is kind of like breaking into camps. But at the end of the day, whether it's the low code or code-ish tools,
it's still code. It's still got to be versioned and stored and tested and deployed just because
it just runs in a different engine. Yeah. Well, we're getting close to time here and I have so
many more questions to ask, but I love that you are sort of the anti-pattern, the anti-pattern voice. What tools are really exciting to you, you know, that may not
be huge successes yet, but that you think are sort of expressive of the future that you see happening?
Well, I think there's, you know, one of the things that we try to do in creating a category is to
find what the category is.
And so I think there's a bunch of companies who have started around automated testing and production or observability that are exciting.
And of course, our product does that.
There's a bunch of companies that over the past three or four years that does model deployments.
And we have capabilities in our product there.
And there's other companies that do automated data governance or data governance as code. And we work with it, but we don't do that. And I think the idea of thinking of things as putting the as code on end and thinking of them applying DevOps ideas.
And there's a whole sort of movement, I think, or set of ideas that come from software that are playing out into the data and
analytics industry. And MLOps, DataOps, DataGovOps, data observability are one of them. And the final
one that I actually think is also interesting is the idea of a data mesh, which is really the
application of domain-driven design into data and analytics. And so I think there's actually a ripe way
to think about how software has dealt with complexity
and the ways and methods and how those play out
into systems that are built with data
that I find just incredibly interesting.
And of course, that's what we did.
That's the purpose of our company.
We just sort of stole the DevOps ideas
and say, hey, move them over here.
They're really good.
Hmm, sure. Well, Chris, I'm sad that we're out of time because I have a ton more questions. I'm sure
Costas has a ton more questions, but that just means that we need to have you on the show in
the future, which we now have proven that we actually can do. So we usually say, hey, let's
catch up in six months and see how things are going. And we had our very
first podcast guest on recently from six months ago when we started the show. So I know that we'll
talk again and I'll be interested to see which companies sort of get acquired or IPO in that
time that we can talk about and sort of validate our anti-pattern hypotheses. But thanks so much
for joining us and thanks so much for the insights. It's been really wonderful. All right. Yeah. Thank you for the opportunity and you guys have
a good rest of your day. As always a great conversation. This is so specific, but since
we try to limit ourselves to one, one or two things, I wanted to spend so much more time
hearing about doing machine learning, you know, 15 years ago at NASA, trying to do air traffic
control support. I mean, that was just amazing. And it's actually really, really interesting to
me that they had to scale the recommendations from the model back and they got better results,
giving a little bit more control to the humans. But that's probably a whole nother episode. So
that was my big takeaway
and what I'll be thinking about,
which I know is a very small part of the conversation.
So Kostas, hopefully you have a takeaway
that's more relevant to the data conversation.
Yeah, although I think your takeaway
is also like quite important to be honest.
And it's not the first time
that you hear something similar, right?
I think it's a common trend
that it's coming to the surface with all our conversations that we have, especially with
people who are in the middle, that the future is not black and white, like humans or AI, right?
The future is going to be built by the synergies between machine learning, AI, and humans. And
that's something that's, I mean, it was clear also like 15 years ago. I think that's
the takeaway from Greece. There are a couple of other things that I really enjoyed in our
conversation with him. First of all, okay, it was amazing to hear about DataOps and make it clear
what DataOps is. And I think our audience is going to find this like very interesting.
I really enjoyed the part of the conversation
around marketing and education.
That's super interesting.
I think we should discuss with more people
from tech marketing and especially data-related companies
and see how they market and how important education is.
And the last part is how important collaboration is
when we work with data at the end.
What Chris was saying is that, yeah, I mean, technologies will get commoditized.
The most important technology that we have to build is technology that will help all the people who need to work over the data to work better together.
So, yeah, those are my takeaways.
Well, as always, a great conversation.
Definitely subscribe on your favorite podcast
network in order to get notified of new episodes weekly. And we will catch you next time on the
show. The Data Stack Show is brought to you by Rudderstack, the complete customer data pipeline
solution. Learn more at Rudderstack.com. you