The Changelog: Software Development, Open Source - Productionising real-world ML data pipelines (Interview)
Episode Date: February 14, 2020Yetunde Dada from QuantumBlack joins Jerod for a deep dive on Kedro, a workflow tool that helps structure reproducible, scaleable, deployable, robust, and versioned data pipelines. They discuss what K...edro's all about and how it's "changing the landscape of data pipelines in Python", the ins/outs of open sourcing Kedro, and how they found early success by sweating the details. Finally, Jerod asks Yetunde about her passion project: a virtual reality film which debuted at the Sundance Film Festival in January.
Transcript
Discussion (0)
Bandwidth for ChangeLog is provided by Fastly. Learn more at Fastly.com.
We move fast and fix things here at ChangeLog because of Rollbar.
Check them out at Rollbar.com.
And we're hosted on Linode cloud servers. Head to Linode.com slash ChangeLog.
This episode is brought to you by DigitalOcean.
DigitalOcean's developer cloud makes it simple to launch in the cloud and scale up as you grow.
They have an intuitive control panel, predictable pricing, team accounts, worldwide availability with a 99.99 uptime SLA, and 24-7, 365 world-class support to back that up.
DigitalOcean makes it easy to deploy, scale, store, secure, and monitor your cloud environments.
Head to do.co.changelog to get started with a $100 credit. Again, do.co.changelog.
What's up? Welcome back, everyone. This is the Changelog, a podcast featuring the hackers,
the leaders, and the innovators in the world of software development. I'm Adam Stachowiak, Editor-in-Chief here at Changelog.
On today's show, we're talking with Yatunde Data from Quantum Black for a deep dive on Kedro.
Kedro is a workflow tool that helps structure reproducible, scalable, deployable, robust,
and versioned data pipelines.
We cover what Kedro is all about and how it's changing the landscape of data pipelines in Python,
the ins and the outs of open sourcing Kedro, and how they found early success by sweating the details.
We also talked with Yutunde about her passion project, a virtual reality film which debuted
at the Sundance Film Festival in January.
So Yutunde, we're here to talk about Kedro, which is a Python library dealing with data
pipelines. First of all, thanks so a Python library dealing with data pipelines.
First of all, thanks so much for joining us on the show.
Thank you so much for having me.
And we should give a thanks to Waylon Walker, who requested this episode just recently.
Actually, sometimes our requests take months, maybe even years for us to put the show together.
This one was relatively recently.
I should mention to our listeners that we do take requests at changelog.com slash request. If you have a topic or a guest or anything you'd like to hear on the show,
just head over there and let it be known. We're happy to make shows that y'all want to listen to.
So we have Yutunda here, who is the product manager at Quantum Black and is on this Kedro
project. Tell us about Quantum Black and what y'all are up to.
Sure.
So Quantum Black is an advanced analytics company
that was acquired by McKinsey a few years ago.
So we're in the consulting gig.
And what we basically think of ourselves of
is we're the black ops teams
that go out to different companies around the world
and across many different industries.
And what we do is effectively deliver functional machine learning code in production. So kind of
like our success model looks at not only do we solve very difficult use cases that have very
challenging design problems, but we also have clients that are able to maintain their own code bases when we go. So that's what we do.
And Kedro is your first open source product that I read that's the case.
Tell me about the experience.
Why is it open source and all that stuff?
It's actually very interesting that you've asked that.
Kedro, yes, is the first open source product that we've ever had coming out of McKinsey
and Quantum Black.
It was quite an experience open sourcing it because it's very difficult to get corporate
open source rolling, especially in a place where it hasn't happened before.
We open sourced Kedro purely for client need. So what would happen effectively was that our
data science teams would go out and use Kedro on their engagements wherever they were going for their consulting work.
And they would encounter, obviously work with the client's data scientists and data engineers and find that everyone enjoyed, like really enjoys like using Kedro.
They suddenly have questions around what do we do to keep on using the tool after we leave?
And the answer to us became, let's actually open
source this tool so that our clients have access to it. And we'll be able to access upgrades, we'll
be able to access an open source community so that they can help further engage with them as they use
the tool. So that became the primary lever for us open sourcing. But we're very excited about open
source in Quantum Black. We just recently open sourced
another tool called CausalNix. CausalNix is a causality data science library, which helps
somewhat tackle that problem between causality and correlation when you're busy assessing
different data sets. We think of it as kind of like a way to really give back to the community
and really make a stake in some of the thought leadership pieces that we maintain at Quantum Black.
We have a very exciting data science R&D function that is quite active with trying to solve issues around explainability, around fairness, around live model performance tracking.
And really being able to share snippets of that knowledge, I think, is very exciting for us. That is very exciting. So I should mention that Waylon,
when he requested the show,
he did say that Kedro is changing the landscape of data pipelines in Python.
I'm curious if you agree with that.
And then secondly, if you think,
I mean, it seems like to me,
if it wasn't open source,
that could not be the case.
Maybe it could be anyways.
But I'm wondering your thoughts
on the open sourcing of Kedro
and how that has helped it
to perhaps change this
landscape, as he says? I'm so excited that Waylon finds that Kedro is changing the landscape for
data science. So maybe I should actually describe what Kedro does. Please do. So we think of Kedro
as your way to apply software engineering best practice to your data science and data engineering
code. So it follows this
whole principle that if we get everyone operating at a higher standard for software engineering
best practice, we make code that is reproducible, it's deployable, it's versioned, and it makes it
easier to put that code into production when it's time to go live on models. It kind of fits into
this whole problem space where for the longest time,
data scientists have not necessarily,
no, they've used code to solve the problem
as opposed to seeing code as the end goal,
which is what is required for that code to be functional
in the machine learning practice
and to have value for business.
So Kedra essentially says,
you have your standard way of working, we're just
going to make some slight modifications to it so that you have more robust code. And in the end,
you get like this production ready code. So I'm really excited that he says that we're changing
the basis of like, how data science should be done. Because we believe in like, getting to the
heart of users and really just having empathy for your workflow and being able to enforce like software engineering best practice is easy using Kedro. Other exciting things that
are happening because teams are using Kedro from what we've seen from like the open source community
is that people are I think following the trend of creating reusable analytics code as well
because there's a certain workflow consistency that happens when you have well structured code. And they're now able to build their own reusable analytics libraries on top
of Kedro, which has been quite exciting to see. So I was digging through Kedro docs as I was
trying to figure out exactly what is this thing that I'm looking at, which is a fun endeavor for
many open source projects. It's like, okay, I see what's written on the label. What exactly is going
on here? First of all, the documentation is really good, which is indicative of the Python
community. I'm not sure about the data science community, but y'all are killing it there. And
the Hello World is a very nice example of knowing exactly what Kedro starts you off with.
And as I was reading it, it was more akin to me as a software developer as a conventional framework.
A thing that establishes norms, gives you buckets, very much kind of a convention over
configuration concept of, here's where this goes, here's where you need to put your pipelines
here, put your data here, et cetera, et cetera.
And if we all kind of follow these norms, life is better, life is smoother, it's more
production ready. Is that kind of the idea?
That's actually completely the idea that it's based on.
You'll note that, I don't know if you're familiar with a tool called Cookie Cutter.
Cookie Cutter follows this whole methodology.
It's another open source project that if we work in a standard way,
so if we have the same setup in terms of our directories
and where we place certain files,
it guarantees workflow consistency
that makes it easier for you
to work with yourself in future
if you have to look back at your code
and also easier for other people
to work with you
because in a sense,
your code becomes self-documenting
because you know where things are stored.
There's a derivative of cookie cutter
called cookie cutter data science and Kedron is derivative of cookie cutter called cookie cutter data science.
And Kedron is actually built on top of cookie cutter data science,
but takes into account how teams also need to construct data and ML pipelines,
as well as also thinking about things like how do we do data abstraction
when trying to load data?
Because there's many different ways of loading data using different Python libraries.
And also things like how do we do versioning of these data sets or models themselves so that we can actually reproduce runs. Because
the complexity of like reproducing code, well, if you were building a software application and
reproducing a previous version of the code is simply, it might just be having a look at the
logs to see what happened when a failure happened and then having a look at the logs to see what happened when a failure happened, and then having a look at the code that created that, and any other inputs that you have. In machine learning,
that becomes a bit more complex because of the different variables that you would have had that
would have made that machine learning pipeline run, for instance. And Kendra kind of tries to,
I think, I did say it's basically software engineering best practice. It does try to
implement those good sense principles in the data science world.
So that's awesome and definitely something that is needed in the data science world.
As has been reported to me, I'm not living there, so just going based on what you're saying and what people on Practical AI have also said our other show about, machine learning.
Now, I'm coming from the other direction, so I have a software background and less of a machine learning background. And I'm not going to speak for our audience,
but I'm sure there's plenty of them out there who are in similar place as me. So when I look at
Kedro, I see the conventions, I see the best practices in place. And I'm like, yes, this
makes total sense. But then I get to the other bits that I'm less familiar with. And I start to
wonder like, what are these things and how do they fit together for me to implement something in production and make it useful? So
when we start talking about data pipelines, nodes, etc, etc, can you explain some of the jargon,
perhaps, and some of the things involved in starting a Kedro project and maybe taking one
into a production application? Okay, so getting started with Kedro means after you've pip installed the library,
simply running a CLI command for Kedro new will have you creating your project template.
It's actually where you find the Hello World example is actually based.
So you can get started in like less than 30 seconds with your first Kedro project.
If you execute another command after you've created your project template,
Kedro run, you would have made your first Kedro pipeline run.
So when I break down the components that are needed
to actually understand Kedro, there are five,
you'll understand that we have the project template,
which is basically our series of files and folders
that are generated by the cookie cutter data science
template. And it's kind of like best practice for where do I store bits and parts of my code.
Although there are quite a few directories that will focus on even where you put your Jupyter
notebooks to where you store your results. The two most important folders in that directory are
where you put your configuration, which basically looks at, if you wanted,
in simply explained terms,
how do we remove hard-coded variables
from our machine learning code?
And there are two types you typically find,
file paths and parameters.
So how do we remove that from our machine learning code
so that it's more reproducible?
So you'll put your configuration in a specific place.
We also talk about our data
catalog, which is one of the library components of Kedro, which is basically a series of extendable
data connectors. If you extend our abstract dataset class, you can actually customize and
create your own datasets for things that we don't typically support. But we support most file formats, so CSV all the way to Hadoop files and Spark tables as well.
Then we talk about the source folder being the next most important directory for you.
And in the source folder, you'll actually find the concepts of nodes and the pipeline.
Your nodes are just pure Python or PySpark functions that accept an input and have an output.
You use these nodes to actually construct a pipeline because it's basically a series of nodes which have their inputs and outputs mapped to each other.
So the pipeline can actually essentially work out where your dependencies are when you're building the pipeline.
And that is the basis essentially of a Kedra pipeline. And that is the basis essentially of
a Kedro pipeline. And those are the primary concepts. They're obviously additional features
built on top of all of these features, including the data catalog, allowing you to version your
machine learning models and your data sets. And there's all sorts of ways that you can extend
Kedro however you feel. Okay, I think I'm getting it. So you have your data catalog, which is your input data,
whether it be CSVs or JSON or SQLite, right? The data catalog manages both loading and the
saving of your data. So if you specify where the output should be saved, then Kedro will handle
that too. Oh, okay. So it's on both ends. Then in the middle, then Kedrel will handle that too.
Oh, okay.
So it's on both ends.
Then in the middle, then,
you have your pipeline,
which is stitching together these nodes.
Is that correct?
Like you're saying,
the nodes are pure functions,
so maybe it trains a model,
maybe it does something else.
I don't know what predicts something.
And then you have your pipeline,
which is basically saying call function a then b
then c then d or whatever is there more to a pipeline than just like here's the order of
operations i know that's basically it there are all sorts of things that you can do with the
pipeline so there are additional features that you can do with it but you've basically nailed
down why kitra is so easy to learn okay because it's basically specifying which function do i need what is its
input what is this output and then let me put them all in this like pipeline format the pipeline
syntax is very easy very cool so then the last bit is like the output then you say this goes back to
the data catalog but what is the results of these pipelines producing is it on a case-by-case basis
based on what you're trying to gather from your model? Or is it always the same kind of thing that comes out on the other end? What
are our results look like? It depends on what you're using Kedro for. So an output could be
a machine learning model. So even just a pickle file, what you would then use to then test future
predictions with. Or it could be, I don't know,
maybe a table that you need to be loaded
into some sort of like database.
Okay.
So it depends on like what you actually need the output to be
because that's what you would set up your pipeline to do.
I think I'm getting it now.
So when we go back to the Hello World,
because it's a simple one for us all to wrap our head around,
it's the Iris dataset,
which is a well-known data set
of the different pictures of flowers, right?
These three different types of iris plants.
And the goal is to classify, right?
So you give it, do you give it an image
and it says what kind of an iris it is?
Or do you give it a,
maybe just give her some measurements of the petals?
Talk us through the Hello World. We can use that as a reference point for conversation
cool um so the way that this pipeline is set up it has four nodes so one which will actually take
in an iris data set so the actual values and it will split the data into train and test samples
and also do some sort of like data pre-processing as well to clean it up so it's in a format that can be used.
Then the next three nodes will train a model,
then create the prediction model for you.
And then how this pipeline ends,
it ends slightly differently.
So this is why it always links back
to what problem were you trying to solve?
Because it allows you to report accuracy.
So you eventually in the last
node can feed in a value and then report accuracy on which based on which values which flower which
iris flower are we looking at i see so it's different than like you said it's all based on
what you're trying to accomplish and in this case that's what it's trying to output and so there you
have it the accuracy is important yes and that's why it's trying to output. And so there you have it. The accuracy is important.
Yes.
And that's why it will actually just tell you what the accuracy is at the end of the pipeline.
So if you did a, like I mentioned, Kedra new and then Kedra run, once you've changed into the project directory that's created for you, then it'll actually just tell you what the accuracy of this model is. How often do you think about internal tooling?
I'm talking about the back office apps,
the tool the customer service team uses
to access your databases,
the S3 uploader you built last year
for the marketing
team, that quick Firebase admin panel that lets you monitor key KPIs, and maybe even
the tool that your data science team had together so they can provide custom ad spend insights.
Literally every line of business relies upon internal tooling, but if I'm being honest,
I don't know many engineers out there who enjoy building internal tools, let alone getting them excited about maintaining or even supporting them.
And this is where Retool comes in.
Companies like DoorDash, Brex, Plaid, and even Amazon, they use Retool to build internal
tooling super fast.
Retool gives you a point, click, drag and drop interface that makes it super simple
to build these types of interfaces in hours, not days.
Retool connects to any database or API.
For example, to pull data from Postgres, just write a SQL query and drag and drop a table
onto the canvas.
And if you want to search across those fields, add a search input bar and update your query,
save it, share it.
It's too easy.
Learn more and try it free at retool.com slash changelog. Again, retool.com slash changelog. So your first open source project at Quantum Black and so far a success, perhaps changing the landscape of data pipelines.
I like that. I like that line.
And with any open source project, there's always big wins and there's often big fails, struggles along the way. You mentioned the reasoning behind it was your clients needed some sort of a sustainability plan for these tools that you were working on for them or for their models to continue on after a contract was over.
What have been some of the struggles or the things you had to consider as you took Kedro open source?
Any insights you can share with the community?
One of the most surprising ones was actually our name.
So we were known by something else internally.
And when we were like, well, we want to go open source, we're going to go public, and everyone was on board,
we kind of heard from legal that, hey, we actually need to check out your name to check that it's not infringing on anyone's trademarks.
So I think this is a bit unusual for open source projects, where it's just like a personal project.
You would never think that, you know, I need to check my name for trademark infringement and it needs to be kind of like
unique and still Kidder is a bit of an abstract name for what the tool does but we stuck by it
and it works but I've actually come to discover that a lot of corporate open source projects
including some friends at Uber have spoken about doing the same thing. So I was like, well, at least it's not unique to us that we had to undertake this.
Another challenge that we had was really thinking about
how do we support our users when they're using Kedro?
Because our initial request for a public Slack or a public Gitter was initially paused
and they told us we need to do some risk assessment on those platforms to work out, for instance, how do we enforce a code of conduct for users' behavior on the platforms as well so that we have free and fair communications on them.
So we knew that beyond GitHub issues and Stack Overflow, both of those channels that we do watch, we needed to do something else.
So you actually see that we spent a lot of time on our documentation.
So I'm really glad that you're enjoying the documentation
because we knew that when we have users coming to Kedro,
they want to get started very quickly.
And if they run into issues, they need to be able to go to the docs
and be able to troubleshoot their way through at the very least
if they're not going to talk to us on GitHub issues,
which is sometimes a big thing for people to post questions
or even stack overflow.
But you'll be excited to know that we will be getting
public communication channels soon.
So that's in the works for us to eventually release to the users.
And then the third thing was that we didn't expect the community
to pick up this quickly.
That was something that we were not prepared for on the team.
So Kendra was maintained by nine people. So it's me as product manager, we've got
Ivan Danov as the Kedro technical lead, and then an amazing group of like software engineers,
machine learning engineers, visual designers, and a front-end engineer who maintains the data pipeline visualization tool called Kedra
Viz. And we were not ready for, I think, how many GitHub issues were created, pull requests were
created because people wanted to contribute code back to Kedra and make it better with us,
and how many questions we get on Stack Overflow. So that was a bit overwhelming. And I think we're
still finding different strategies to manage it.
The one that we do do on the team,
because we're inevitably the ones that know the most about Kedro
and how to fix different issues,
is that we have a rotating role on the team called the wizard.
And it's basically we have a wizard
and then the rest of the group are the warriors.
And if you're the wizard for the week,
your job is to basically field all user queries,
both in our internal channels
as well as our external channels as well,
to try and make sure that users get like quick
and speedy responses to their questions
or to any issues or feature requests that they've raised.
So that was one of the things that made,
one of the strategies that have made things a bit better.
But we are looking at ways to even scale support
for our open source users in the future.
So we're looking into new roles that McKinsey and Quantum Black
have never hired before, because we've never open sourced anything.
But developer advocates or DevRel, I think it's sometimes called
developer relations, to come on to the team and help us
really scale out what that model will look like
for us. So yeah, those are some of the unexpected, I think three of the unexpected things that we
didn't realize when we were open sourcing would happen. Awesome. Very good. Let's go back to number
one. Let's go back to the name. What does it mean, Kedro? It means the center in Greek. So we kind of
look at it, Kedro as being the center of your data pipeline
because of the way that it forms
kind of like infrastructure-stash foundation
of your analytics project or ETL pipeline
or whatever you need to build.
So Kedro means the center.
The center.
And so after you came up with the name,
that's when the, was it a trademark search went out?
I'm sure there was other searches as
well did you search on github and twitter and domains were these also things that you took
into account before picking the name yes it was actually a lot more involved naming and quantum
black has been quite exciting because ketra is one of three main products and we'll be expanding
like the range of products that we have so naming is um it's always an issue in labs where we sit because we sit within quantum black labs
which does all the product engineering and yeah the naming exercise was a stakeholder management
exercise because we had to have happy branding because i was helping with the product marketing
and our head of a global head of marketing Catherine Shenton
but we also had to make sure that the team is also happy to represent the name as well
so beyond having I think agreed consistency on a few names they also had to go through the social
media check they also had to go through reference meaning checks or even checks in other languages
as well because Quantum Black is a global organization and then they had to go through reference meaning checks or even checks in other languages as well, because Quantum Black is a global organization. And then they had to go through the legal check
to check wherever there were trademarks for this name in the many jurisdictions that McKinsey
operates in before we came to Kedro. So Kedro was marked as the least risky name.
What were some of the riskier names? Can you share them or are they just on the cutting room floor,
never to be mentioned again?
Some of the more interesting names were, I think when there was no agreement on names,
consisted of names like Burano, Pumlico, which is a plumbing company in London.
And the list of five names that included Kedro in the end that went for final legal verification
included Braze, which kind of worked because it referred to welding or stitching
together pieces, which is what the pipeline does. We had Knittic as well, which is kind of with a K,
which was trying to also playing with that whole thing of knitting things together.
Knitting.
Yeah, Knittic. You see, it doesn't work when you try and say it. So that's one of the reasons why
this name failed. And then Spindle, because we're trying to reference a whole thing of many threads
joining together. I then Spindle, because we're trying to reference a whole thing of many threads joining together.
I like Spindle.
Yeah.
But all of those names failed the legal verification
of that figure in the end.
There's no worse feeling than when you come up
with the perfect name, and then you go do all your due diligence,
and somebody else is using it.
And you're like, no!
Yeah.
The checks also included like github
repositories as well because we knew we'd have to survive um and that's it so it was really just it
was an adventure to find the name and i'm really glad that we settled on kedra it was the one that
fit some good things about the name kedra so i think i either wrote this up at one point or i
need to write up the anatomy of a good name and i'll say two syllables, great. The fact that you can spell it
easily after hearing the word, great. It's not ambiguous in that way. But it is intriguing
because when you hear it, you don't know what it means. There's no immediate attachment, at least
in my mind, maybe to native Greek speakers that there are, but to an American mind, there's no
immediate attachment to anything. So it kind of stands on its own.
That's true.
That's true.
We never thought of it, and thank you.
I will give those reasons to the team.
You get the seal of approval.
Let's move on to the point, too, that you mentioned,
kind of the community side and the documentation
and how to figure out kind of where the community meets,
the code of conduct, all of these different things.
I'm curious because you seem to have checked all the boxes.
We look at a lot of open source projects.
I look at Kedro and thought, okay, Apache license, they got a license, they got a code
of conduct, they have a documentation.
As I said, I've been impressed with the documentation.
So it seems like there was at least knowledge of we need to do this correctly.
And then also, here is our effort at correctly.
So I'm curious curious was there research gone
into how to open source a project or had people on the team done it before how did you guys know
like what boxes to check and which things you really needed to address before you could open
source it so the motivation for great documentation actually came from our technical lead for a while
he'd been passionate about the idea
of creating an end-to-end tutorial.
So you'll see Chapter 3 in the documentation.
It takes you through this amazing space flights tutorial.
It takes about two hours to run through everything
and gets you acquainted from beginner
all the way to intermediate functionality in Kedro.
I think it was Ivan's passion for the users
and them being able to learn and understand the tool because Kedro. I think it was Ivan's passion for the users and them being able to like learn and
understand the tool because Kedro documentation before that had literally just been kind of like
the API docs and the user guide where we describe kind of like how the individual parts of the
library like the pipeline, the nodes and the data catalog work. So really kudos to Ivan thinking and pioneering
that we need to do better
in terms of like how we explain these things
and also using this as a solution
for the fact that we couldn't host a Slack channel
when we open sourced.
But in terms of like,
how do we set up the code of conduct
and how do we think about like,
kind of like this practice
with how we set up like a GitHub repository.
I used to spend weekends going through
kind of like people talking about how to do open source community management and what does best
practice look like for running a community. So it was important for me that code of conduct went in
so that we'd have a way to enforce if something were to go wrong on the repository, luckily
nothing has. And we'd have a way to communicate with
the users by referring them back to the code of conduct one and then to also being able to take
action to resolve things so yeah it was kind of like just like how do we have an empathetic
view to how someone will perceive a product and how do they get on the best side of that and trying
to put yourself in the users or the first viewers shoes. On this note, additionally,
I do know that open source does have diversity issues as well. So I'm really excited to, you
know, eventually see hopefully PRs from like women or women identifying people on Kedro. So
yeah, at all times, like we do also have a style guide for how we communicate with our users as
well. That was also something that we set up. So how do we say, you know, thank you for contributions? How do we
respond to questions as well then? And that is a team standard, because we never want to create an
environment where someone is offended and doesn't want to come back and interact with us on our
different channels because of maybe a comment or whatever the case might be. So those are things that we
did look at. Very cool. So I will say to your third point, which was you were a bit surprised
by the level of success or people glomming on and using the project, I would say probably has to do
with the stuff that we just talked about and the thoughtfulness and the TLC that you put in
around open sourcing and doing things well. That being said,
I'm curious if you had some sort of a launch plan or like,
were there ways that you said, okay, Kedro is out there.
We want people to use this. We want people to try it.
Were there press releases? Was there blog posts?
Or was this just a thing that grew organically after open sourcing it?
That is a fantastic question. And there was a huge release.
I mean, it was McKinsey's first open source product.
So everyone was very excited.
The other thing that we realized in hindsight is that I'm really passionate about product marketing
because you can kind of solve it as a,
it's a user problem.
Because how do I make sure that
you're getting the right information
so that I don't waste your time?
And you also learn like what value something might have for you because it was, you consumed
it in a short time and in a form that you needed it in.
So I kind of approach product marketing on this, like I approach like product management
and it's like, how do we optimize for the time that you have?
So I was able to construct a massive marketing campaign with the Kedra team and with our head of marketing and with McKinsey Branding in order to actually deliver what was our massive open source release in June of last year.
So press releases went out.
Articles were released on like towards data science.
There was social media.
There were some webinars done. And yeah, we just wilded out a little bit on being
able to, I think, basically make history for the firm and release their first open source tool in
a place where McKinsey is recognized as having this like amazing intellectual property that
would never open source. So it was really just exciting for us. So that's why we went big
on that launch. Do you think the success of Kedro will lead to McKinsey open sourcing more things,
or is this more of a one-off because of the client need?
Good question. And it's actually reflected, I can actually show you an example where that's
not actually the case, because it must have been now, we're going on to two weeks ago,
we released our second open source project, Causal Next, which is essentially a causality
Python library that helps data scientists
address the question of causality
versus correlation in your data sets.
And we released this one purely
because it was an exercise of how do we do,
how do we showcase some of the R&D work
that we have done on client projects
and eventually have made their way into more formal products
because we've been able to try and test those methods.
The really cool thing about causal next is that the research that it's built on
was actually presented at NeurIPS, not at the last NeurIPS, but the NeurIPS before.
And we had quite a few data scientists that were intrigued about this whole
no-tiers approach for assessing causality and now this year we're finally releasing the
open source library that well the library that we're able to build using that that theory so
yeah we working out still working out what does our open source strategy look like for the firm
there's still some many interesting questions about like, how do we tackle it? You know, what do we decide to open source? How, what is our process like finally
nailed out? Because I still think that there are places that we could optimize the process a bit
better. But it's just an exciting place that we find ourselves in. We hope that it inspires many
more people within the firm to consider open sourcing. I think I do get emails with people
asking like, how did you open source Kedro in hopes that they can do the same? So maybe it was
just the first of many. I really do hope. For those curious about causal next, I have scooped
it up and we'll include a link in our show notes. So you can click there and check it out. Hey, what's up? Adam Stachowiak here. I got a question for you. Have you heard of our newest
podcast yet called Brain Science? I'm not going to be offended if you haven't. It's okay. But
here's how you find out more about the show. Go to changelog.com slash brain science. We have 10
episodes on the podcast feed. So have fun, go back and listen to all 10
and subscribe to get more. I actually get to co-host this podcast with a doctor. That's what
makes this podcast legitimate. If it was just me, it would not be as cool, but I get to team up with
Muriel Reese. She is a practicing clinical psychologist, and it's so much fun digging
into deep subjects around being human.
Here's a preview of episode number 10. We're talking about shame.
We haven't talked about this as it's relevant to creativity. And if you can see that when we are
trying to navigate shame, this sense of inadequacy, do you think you're going to be more apt to be creative or less apt to be creative?
I would guess less apt because I'm trying to focus on fight, flight, or freeze in those moments,
and I've got no time to be creative. I got to be the most necessary Adam possible to get through,
right, rather than be creative. Yeah. So let me tell you the dynamic. Adam,
I need you to be remarkably creative so you can come up with the best, most user-friendly way for this to work. Except you suck, you didn't do it enough, and you need to do better.
Don't ever tell me that again, Mariel. That is not nice. But I can understand how in that kind of moment. So if you're leading teams out there, don't lead with shame.
Okay.
Well, it's really recognizing the way that you have to write.
If you can see shift your mind into seeing this through a way of management, like I need to manage how I interface with other people, especially around creative endeavors.
Yeah. That I need to be deliberate about identifying what they're doing well and even saying,
like, create clarity, like what you want them to approve upon.
All right.
To keep listening, head to changelog.com slash brain science slash 10.
And that will take you to the episode titled Shame on You, where we examine the hustle
of not enough, how shame relates to
imposter syndrome, our fight, flight, or freeze lizard brain response to threats, and so much more.
Again, changelog.com slash brainscience slash 10, or search for Brain Science in your favorite
podcast app and subscribe. We'd love to have you as a listener.
Switching gears a bit, let's talk about fun stuff.
You are involved in something that looks very cool.
The development of what you call a social impact virtual reality film as a Sundance New Frontier Lab fellow.
Tell me about that. Sounds intriguing.
Sounds like you've been digging around on the Internet, which is cool.
Maybe just a little bit.
I was a Sundance New Frontier Lab fellow along with a co-creator, one of my best friends, Sharifa Ali. We've been best
friends since like high school, grade 10 in 2018. And the reason we were Sundance New Frontier Lab
fellows was because we've been working on, at that point it had been for two years, one year,
two years, two years, we've been working on a film, a virtual reality film called Otomu.
So Otomu is based on this Kenyan myth that if you walk around
a sacred tree, seven times you change gender. So go from male to female. And it's kind of like
this interesting comment on gender fluidity ideals, which are kind of seen as un-African.
So obviously it now tackles this whole thing of why is modern day Kenya so adverse to gender fluidity ideals and homosexuality as well.
So there are a lot of issues that we cannot dig up.
Obviously you look at like religion and colonialism as factors that would
impact that. Moving on from there, we were excited.
It's also two weeks ago to actually head to Sundance 2020 and actually
premiere a Tomu.
So we've been iterating on the piece since then,
and we were able to actually showcase the piece at Sundance.
So yeah, that was a really, really cool experience,
and there's still more work to do,
because that was essentially,
Otomu is in its fourth iteration right now,
and there still will be a fifth,
which really focuses on how do we distribute the film
to different partners in the US, Europe,
and also back home for us.
Sharifa is from Kenya, from South Africa.
So how do we do that in an accessible way?
So this was Sundance Film Festival 2020 just happened,
man, a couple of weeks ago. So a fresh
thing that just happened. Curious how you even film a virtual reality film. And when you have
an iterating, are you you're filming more things? Or tell me about the process? Sure, the way that
Atomo is set up, it's built in Unity, essentially, what you do is you get it's also a dance piece so we had access to two dancers
who would essentially do motion capture to actually capture how they're moving you can
build an avatar from that and place it in any environment in our case we placed you in the
kokura forest where mugamu trees which is the sacred tree are typically placed and that's
essentially how the environment is built.
There was also some thought around how do we do different user experiences as well?
Because we did optimize the piece for the Oculus Quest.
We specifically waited for the Oculus Quest to come out
because we knew it was a higher resolution
virtual reality headset.
So higher than the Go, for instance,
but still more accessible in terms of price
than the Rift was, the Oculus Rift.
So we specifically designed it for that.
And we also were fascinated about the multiplayer experience
which we've had some moderate success with.
I think the next iteration will fully nail down
what the multi-user experience is supposed to look like.
But that's essentially how you do it.
So you can decide.
I think there's many different ways of filming virtual reality experiences. I think you have seen the 360 video
kind of like documentary pieces, which are still released. But a lot of pieces will focus on
using motion capture to build avatars and then eventually constructing environments around
the different avatars that you will have. So in terms of presenting that at a film festival, is it just a room with a bunch of these Oculus
quests and you go in and have, because it seems like an individual experience versus
a shared experience.
That's a good question.
So how our piece was constructed is basically that.
So when we were presenting at the Sundance New Frontier Space within this year's festival,
we were in the biodigital theater, which was a specific area set aside for what they called multi-user virtual reality experiences.
In Atomu, we're able to get seven users in at the same time.
So that means you each put on your headset, each have your own set of controllers,
and you can each affect the piece based on how you're interacting with it.
In the piece, you are ancestors who are basically saying
that this way of life is right,
and you're following a character called Waikiki
as they make not a gender-based transformation around the tree,
but what we call the journey to be the most honest version of themselves.
So you help this character along the journey as they kind of like dance and
struggle around the tree in their journey.
So it was important to us to have the multi-user experience.
It was a design point for us because the artistic intention for it is
everyone understands what it's like to not be your true self in everywhere.
It's a human experience of sometimes being unsure of yourself
because, I don't know, someone's made a comment
or doubting yourself or even hating parts of yourself.
And you being able to support someone on that journey
means we would love for you to be able to know
that you can support other people in their lives
on their pursuit to be the most honest versions of themselves.
And you also recognize that you're not alone in that experience as well,
because there are other people going through the same thing.
So that was why we opted for the multi-user experience.
We still had some technical challenges with it.
The system that we're using currently is quite expensive,
which goes against one of our design principles
of making this piece accessible.
But we're looking at different ways
to actually try and code it,
kind of like hack the Oculus Quest,
if they don't help us in the end,
to do the multi-user experience
without having additional gear.
Because that is really, really important for us
in terms of distribution.
Wow, so how was it received?
Very well received.
Sharif and I also made certain choices to whether or not we went explicit
on the way that we described some things versus like very abstract.
And we kind of like strayed towards the still kind of like abstract focus,
but people got the intent.
People understood why they were in the piece,
why they were the ancestors,
and why it was a multi-user experience.
They also understood how the piece was constructed
in terms of like Wakiki's journey
and understood everything that happened there.
So we have received like really fantastic feedback.
We have obviously opened it up and told people that,
hey, we're still iterating on this piece,
so any critical feedback that you have, do let us know, we're still iterating on this piece. So any critical
feedback that you have, do let us know so that we can build it into the piece. So yeah, it's been
very well received. And we're very happy. This has been a long, long time project. But yeah,
I'm really excited for, I guess, 2020 and finally completing Otomu and seeing the impact that it was supposed to create.
Yeah, I was just going to ask if it seems like this is like a living project or if it's
just one that's still being formulated and will eventually come to its natural conclusion.
It sounds like there will be a completion step, like when you brush off your hands and
say we're finished.
I don't know.
Virtual reality experiences, at least the model that currently exists, means that, I guess because the tech is moving, but it's not really moving that fast, means that they live longer lives than I think mainstream films or mainstream media would. really old oculus rift um can still watch movies from today for instance so in terms of like overall
end of tomo i'm not sure that there is because we are considering different models of distributing
the piece in museums and even schools as well as working with like non-profits as well because
the intent of the piece obviously is the journey to be the most honest version of yourself
but we can bring someone to understand that it's actually about how do we deal with LGBTQ issues
and accepting people that have different sexuality preferences so really being able to use it with
non-profits is also important for us too so we hope that the piece lives on because the intent and the meaning behind why it was made still remains relevant.
My ideal world, I think, not that I know a tomu could achieve this.
It would be wild if it could.
It would be that the piece actually wouldn't need to exist because everyone was just comfortable in their skin and was accepted.
But we have to do our part.
That's why we hope the piece lives on.
Very cool, Yutunde. How do people get in touch with you out there on the internet? was accepted but we have to do our part that's why we hope the peace lives on very cool you
today how do people get in touch with you out there on the internet where can people reach you
on the internet i'm quite active on twitter um so you will find me tweeting away i'm at
yetu data so y-e-t-u-d-a-d-a i'm on instagrams as well but i don't really use that one anymore
and you can yeah if you if you find me on LinkedIn, I don't,
I know you only then will accept you.
So Twitter is probably your best bet.
Twitter is probably your best bet.
Yeah, that's the easiest place to reach me.
Otherwise, you'll find me on the Kedro GitHub repository
or on Stack Overflow, especially when I'm the wizard answering questions.
So yeah, those would be the two channels to find me.
Very good. Well, thank you so much
for joining us today. This has been
a very interesting dive into Kedro
and also into Otomu and the work you're doing
at the Sundance Film Festival and
beyond. Good luck to you on both endeavors.
Any final words from you
before we say goodbye?
Quantum Black Labs
specifically, so the product engineering part of Quantum Black,
we're hiring.
We need help finding amazing people
that can do software engineering.
So from full stack engineering all the way
to even just being like Python focused devs,
we're looking for you.
We're also looking for product designers as well.
If you spike on UX, you will be loved. If you spike on the visual side, you're looking for you. We're also looking for product designers as well. If you spike on UX, you will
be loved. If you spike on the visual
side, you will also be loved. And product
managers as well. So we're really
rolling out the team. And as mentioned,
specifically on Kedra, we're looking for
developer advocate and
dev relations people. And if you can do Python
as well as being one of those people, then we love
you more. So yeah, if
you're interested,
I think the best place is to head through to actually the McKinsey website,
which would host our job offers.
But otherwise you can just like
ping me on Twitter and send me your CV
and I put it straight through to the recruiters.
It makes your life so much easier.
There you go.
Hit her up on Twitter and get that ball rolling.
Well, thanks to Tunde.
This has been awesome.
And to everybody else, we'll talk to you next week.
All right.
Thank you for tuning into the Change Log.
If you're not subscribed yet to our weekly email,
you are missing out on what's moving and shaking in the world of software
and why it's important.
And, of course, it's 100% free.
Fight your FOMO at changelog.com slash weekly.
When we need music, we summon the Beat Freak Breakmaster Cylinder.
Our sponsors are awesome.
Support them.
They support us.
We got Fastly on bandwidth, Linode on hosting, and Rollbar on air tracking.
Thanks again for tuning in.
We'll see you next week. Thank you.