The Data Stack Show - 96: How To Collect and Leverage Data From the Physical World with Prateek Joshi of Plutoshift
Episode Date: July 20, 2022Highlights from this week’s conversation include:Prateek’s background and career journey (2:10)The lack of advanced data tools for the physical world (4:55)Dealing with data from the physical worl...d (10:53)Stocks in the physical world (14:20)What it takes to execute this kind of project (19:05)Challenges around this infrastructure (25:56)ML tools that are useful in this environment (31:55)Physical instrumentation and environmental interaction (36:43)Current adoption of physical instrumentation (42:50)Data’s responsibility in sustainability (45:56)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Welcome to the Data Stack Show.
Each week we explore the world of data by talking to the people shaping its future.
You'll learn about new data technology and trends and how data teams and processes are run at top companies.
The Data Stack Show is brought to you by Rudderstack, the CDP for developers.
You can learn more at rudderstack.com.
Hey Data Stack Show listeners, Brooks here.
Usually, I'm behind the scenes keeping things rolling for the show, but today I'm coming out
of hiding to share some exciting news. We have another live show coming up, and we want you to
join us for the recording. This time, we're bringing back Tristan from Continual and Willem
from Tekton to talk about the future
of machine learning. We'll record the show on August 10th at 2 o'clock Eastern, 11 o'clock
Pacific. So mark your calendars and visit datastackshow.com slash live to register today.
All right, Costas, this is going to be such a fun show.
One, I'm losing my voice for some reason.
I think I talked too much in meetings yesterday.
But two, we are going to talk about a sort of type of data and context for data that
we've never talked about on the show.
And that is data that's emitted from the physical world by sensors attached to all sorts of different devices,
specifically in manufacturing. We're going to talk with Prateek, and he runs a platform that
collects all of this data from all of these different physical devices and sensors in
factories, which is fascinating. My big question is going to be about the data itself, actually.
Because when we talk about data in the context of a normal business, you have your transactional data, you have your analytics type data.
And we talk a ton about that on the show.
But I'm just really curious to know what it's like to work with data that's being emitted from sensors.
So that's what I'm going to ask. How about you?
Yeah, I'd love to learn more about what it means to instrument the physical world.
How we can collect this data and what are the challenges and do we need like a
different kind of stack to work with this data?
So I think we've had the right person today to have this discussion.
So that's good.
Let's jump in.
Yep.
Prestige.
Welcome to the Data Sack Show. We are so excited to have you. Thanks let's go. Let's jump in. Yeah. Prateek, welcome to the Data Sack Show. We are
so excited to have you. Thanks, Eric. It's great to be here. All right. Okay. So I have so many
questions about data and the physical world and sustainability. But of course, let's start where
we always do. Give us your background and what you do today at Plutoshift.
Sure.
I am the founder and CEO of Plutoshift.
Plutoshift is a data platform for industrial sustainability.
We help industrial companies achieve their sustainability goals with smarter operations.
What that means is operational data
like pressure, temperature, floating, it flows into the platform and what comes out are metrics
that the operations teams can use to monitor their physical infrastructure. And we help
companies that make physical products like beer and ketchup and cheese. So these operations are very continuous.
So keeping an eye on them at all times and detecting and predicting events of interest
is of the utmost importance to our customers.
So that's what we do.
At the end of it, we help reduce consumption of resources like electricity, chemicals, and water, which are very critical for carbon footprint.
With regards to my journey into machine learning, pretty much machine learning is all I've done in my career.
I started my career at NVIDIA and over the last few years ended up building systems for mobile, cloud, edge, and I've worked across a number of data types like images, text, and time series data.
And along the way, ended up publishing a few books, technical books on the topic, mostly oriented towards developers on how to build production machine learning systems.
How do you think about
applications, data? How do you make sure that it addresses a variety of use cases?
So yeah, that's been my journey into this field. Very cool. Okay, I'd like to start with something
that you mentioned when we were chatting briefly before the show. And you said that in terms of
data tooling, you know, so for every hundred tools in the world of software, you know, there's like
one tool for the physical world. So, which is interesting, right? I mean, the world of software
is really young, but there are new data tools every day.
I mean, you know, the space is just completely exploding.
So I'm interested to know why you think that is, because intuitively I would think, okay, well, you know, manufacturing has been around for way longer than software. And so, you know, why are there so few sort of advanced data tools, you know, for the physical world, especially when
we think about, you know, manufacturing and physical processes like that? That's a wonderful
question. You know, when we talk about tools, we mean software tools, right? So for the world of
software, we have many, many tools precisely because they're software native, meaning software by definition
is native to itself. And when you build a tool for software, they play very well together by design.
So if let's say you have a data center or you have a large number of servers you need to keep an eye
on and you build a tool that can take in that data and provide you with the metrics.
All of that, they play the same game, same framework, right?
And when it comes to the physical world, it's been around for way longer.
And it's not, at least in the past, go back 10, 15, 20 years, it wasn't instrumented in a way that all the tools can play nicely with it.
So that's why, just by design, software allows, when I say software, basically, I mean any infrastructure we use to build software.
By design, it lends itself nicely to all the tooling but in the physical world the infra is not
commoditized meaning collecting the data storing retrieving connection to the physical source
making sure it's continuous it doesn't break and also the people right the people who run the
physical systems they have their own requirements right they're not they're not software engineers. So because of this gap, the software world really went way, way ahead.
But now, in the last 10 years, physical infrastructure has been getting instrumented a lot.
In fact, today, I would say most of it is commoditized, meaning if you have a large number of pumps or membranes or filters, you install sensors
that are connected to the internet.
They beam data to the gateway and then to the cloud, then to a nice data warehouse.
From there, you can connect to it via API and build amazing applications.
Now, all of that happened in the last 10 years. So for any company or any developer, for them to build any kind of
tooling, this infrastructure needs to exist. And that's what's happening now. So that's how I look
at the amount of the number of tools that exist, the difference and how physical infrastructure
is now definitely catching up. Yeah, that's interesting.
You know, another thing, and I'd love, because I really don't know a ton about, you know,
physical manufacturing, but the other side of it is your point on sort of software,
you know, sort of being native to itself almost, right? Like it naturally produces data, right? Or
like interacts with or relies on data. And so
there's sort of a fundamental principle there. Whereas in the physical world, let's say,
you know, there's a process where there's canning or bottling or something like that, right?
And I know this is changing, as you said, but the machinery is designed to,
you know, put caps on bottles, right? not necessarily like produce data right it's designed
to like do a very specific thing at a certain speed you know consistently and you need enough
data to like make sure that the machine keeps running but it's not necessarily like a fundamental
layer like in the the structure of the thing itself. Right, right. That's actually a very good way of putting it.
And that's precisely the point is by design,
as you said, a process in the physical world
is designed to make that product.
In this case, let's say if the product is a beverage,
the process is designed to handle the steps in between and produce the
result, which in this case is beverage. And along the way, if you want to keep an eye on it,
use that data to make decisions, you need to instrument the process, right? By itself,
it's not going to just produce the data. And actually, it's not gonna, it's not gonna just produce the data and actually
it works in a similar way in software as well. Meaning let's say you have a, you have a large number of servers or applications.
Unless they're instrumented to beam data to you, they're not going to do it by
themselves, but it's kind of it almost like you do it almost because it's there.
It's easy to do and doing it is, is comes, you know, it's, it's, it's kind of almost like you do it almost because it's there. It's easy to do.
And doing it is, it's almost like feels native to just do it, right?
It's not like a separate task.
You write a function, you do a quick test, you want it to be beta.
So that's, yeah, it's a different framework.
Yeah, absolutely.
And I mean, instead of opening a code editor, you want to instrument a physical machine, like someone has to go like, take the machine apart and install something. Let's talk a little bit about I'm going to steal one more question, Kostas, because I'm, I'm in the world of software, you're dealing with, you know, there are certainly various types of data, but you have your usual suspects, right?
You have sort of customer-ish data, call logs, customer records, clickstream events, blah, blah, blah, financial transactions, you know, inventory, all that sort of stuff, right? You talked about pressure, you talked about temperature,
flow rate. Those are all very different, different units. I'm sure that the instrumentation capturing those is different. How is it dealing with that data? Is it just a totally different
world than working with your standard, say, data that drives the eye, you know, for a, you know, a consumer business?
In the physical world, the analogy I like to use is there's process and there's asset, right? So
the equivalent of that in the cyber world is server and application, right?
One is like a real thing and touch and feel.
Application is abstract.
Like you define the application and it does a certain thing.
So similarly in the physical world, asset is a pump or a membrane.
And a process is, hey, I need these 79 membranes to do X, Y, and Z.
And here are like seven steps to do it.
So that's one.
So that's what we work with in the physical world.
And the type of data that ends up coming out is,
one would be the operational data, like sensor data.
What I mentioned, temperature, pressure, flow rate,
all of that is measured by sensors that are installed at the right location.
And then we have ERP data.
ERP is what is said about resources.
Basically, it contains data about how much resources is being consumed, when, what, scheduling.
All of that information comes from ERP.
And the third type is CMMS or maintenance.
Basically, the infrastructure is being managed by the operations teams.
And what do they do on a daily basis?
Basically, if they do something, they've got to make a note because if your shift ends and the next person comes in, they need to know, right?
Hey, that membrane number 72 underwent a maintenance event.
Or pump number nine has been consuming 4x the amount of electricity.
So need to take a look.
So basically, CMMS systems store all these maintenance events.
And these are the three main types of data we end up working with.
And again, within each type, there is so much variety.
But that's what we work with. And again, within each type, there is so much variety. But that's what we
work with. Now, to build a product in the physical world, it works very similar to
like a workflow management tool, meaning you have to define a workflow. You have to define
what data, what format it accepts. And then once you define that, then the workflow becomes well-defined, the model plays
a role, it outputs the specific metrics in real time. And that's what we end up doing because
you're right. If you don't define anything, the variety is so much that it becomes like a custom
project every time. So to build a product, we'll say that, okay, let's say there's a workflow called membrane monitoring.
Right. The, by definition, it's going to monitor membranes and it's going to
accept say 14 columns of data.
And column number one should be temperature, column number two is
pressure and so on and so forth.
So basically you define the types of data that can enter and the right format.
It goes through the workflow.
Workflow is is what we design. It's the design of data that can enter in the right format. It goes through the workflow. Workflow is what we design.
It's the design of the product.
And then it outputs the metrics
that can be consumed on mobile or cloud.
And yeah, that's how we end up
advertising this offering.
Yeah, super interesting.
Okay, Kostas.
Yeah.
I've been holding the mic too long.
Oh, no, no, no.
That was awesome.
So, okay, a couple of questions, but I'd like to start with an observation
because I hear you talking all this time and you're describing processes
that engineers are probably familiar with, like how we instrument and collect the data
and use the data to figure out what works and what not
and all that stuff.
And there is something similar happening in the physical world.
Now, in the digital world, let's say, in the cyberspace,
we have very specific technologies and stacks that we are using.
It's like we have the snowflakes of the world out there.
We have data warehouses.
We have time series databases.
We have, I mean, all the things that you can find in the AWS catalog of like 179 or more.
I don't know how many products they have, right?
Right, right.
Is there any equivalence between like how, like what's the stack in the physical world?
You mentioned a few things about like sensors
and like the ERP tools and all that stuff.
But if someone wanted today to go out there
and build like an instrumentation of a factory, right?
To instrument a factory, what do we need?
And how similar it is to the distal world?
Yeah.
Actually, I'm going to divide the stack into two big parts and we'll dive into each.
One is instrumenting a physical system and getting the data to a data warehouse.
Let's call that one part.
The other part is once you get into a data warehouse,
how do you process it? How do you structure it, extract knowledge and deliver to the end user?
Now, the second part works very similar to how it works in the cyberspace. And actually,
that's a good thing because once it gets to a snowflake, we can use a very good, well-defined,
well-accepted tools to process
that data and deliver a great software product. But the first half is where it gets very interesting
because that has no equivalent in the cyberspace. So if you are running a facility and it's not
instrumented in any way, it's not doing anything in terms of data. So what you want to do is one,
you want to, let's say you have a simple example. Let's say you are running 45 membranes. You want
to collect some basic data. So what do you want to do is you want to install sensors,
basic sensors that are connected to, that can talk to the internet. And big companies
have started manufacturing these at a very low
cost, like Honeywell is a good example.
But many, many hardware companies, they sell sensors that can talk to the
internet and they're very well defined.
Okay.
So you installed a whole bunch of sensors.
Now from there, from the facility, all of that, all those sensors, they're not powerful enough to stand on their own.
Or they could be, but it's going to be very expensive.
So you install low-cost sensors that are connected to a gateway locally, like Intel makes gateways free.
It's like a big black box that can talk to the 100 sensors in the facility.
So the sensors beam the data to this gateway.
The gateway is very powerful, a lot of processing power. It can connect to the internet. It sensors beam the data to this gateway. The gateway is very powerful,
a lot of processing power. It can connect to the internet. It can beam the data to the cloud,
let's say AWS. And once the raw data comes in, the next step is you pre-process it,
you put it in a data warehouse or a data lake, depending on your choice. And once it gets there, after that,
the methodology is standard.
You connect to it via API, you pull, you query it,
you can do whatever you want,
like a good data intelligence product.
And really, that's how, at a high level,
that's how it works.
And there are many firms,
they're called integrators, right?
You just go to them and tell them,
hey, I want to instrument my physical system.
They'll go out and they'll shop for hardware and the software you need to,
for the sensors to talk to the gateway.
They'll figure out all of that for you and they'll just build it and
and your factory is now instrumented.
So that's how, that's how it works here.
Henry Suryawirawanacik, Like how long usually takes like a project like this?
Like let's say to instrument like a takes like a project like this? Like let's say
to instrument
like a factory,
right?
Because it sounds
like, you know,
like in software,
I mean,
we are experienced
enough to be like,
okay,
we start a new project.
Let's start the testing
from the beginning.
Let's write unit tests.
Let's do the instrumentation
like from the beginning
and all that stuff.
But like,
okay,
a factory obviously like does not work like this.
You might be a bottler for like, I don't know, like decades.
And you decide like to go and like instrument.
How, how long usually it takes and what kind of like investment, like in
resources from a company is required for, for, for something like that?
Yeah.
The way I've seen this work is companies that are doing this for the first time.
What they do is they choose a facility and within that facility, they choose a
specific part of it just to see how it works.
They want to get, you want to get familiar with buying sensors, installing it,
running it, and then having a steady stream of data. And for that, the timeline, I've seen
timelines very a lot. So there isn't like an exact timeline every company will get to. But
on average, just considering all the people who need to sign off and the time it takes to get
something running, I've seen something from six to nine months. And again,
it's not just like installing the sensors is very fast. That's not even the point. The point is,
if you've never done this before, you need to understand, okay, I have membranes from this
company and I need this type of data. So who makes those sensors, right? It's not exactly a copy
based solution. So that's, so if you haven't done that, that's what it takes. And then going from
there to the next facility to the one after that would
be pretty fast, mostly because you know how to do it, you know who does what,
and you know how the system works.
So that's what I've seen.
And also the good thing is all the hardware that's being sold in the more
recent years, they already come equipped with all the sensors.
So if you built, or if you upgraded this facility in the last decade, you're fine.
But as you said, if you've been running this for like a few decades and you're doing this for the first time, that's when you need to think about, okay, how do I instrument this so that I can collect the data and get this running?
So that's what we are looking at.
And also you said one more thing about investment.
Investment really depends on how many sensors you buy.
Sensors are very, very low cost.
But the thing is, similarly, for example,
let's say you have 45 membranes.
Each membrane can be instrumented with just one sensor
or 100 sensors, right?
100 sensors obviously means you'll have amazing data,
very high level of detail
on every aspect of it. So let's again, companies decide, let's say they have a budget that allows
them to install only four sensors, right? Different types of sensors on a membrane.
Then they'll say, okay, since I only get four, I'm going to choose the critical ones like temperature,
pressure. So that's how it usually plays out there's no i've seen companies
that are they have they just want a very very rich data and they just installed a whole bunch of
sensors so they had a lot of budget and i have seen companies where they're like only one sensor
per you know big device so again the problem is if you don't instrument it enough it'll feel like
oh i invested all this money.
Where's the outcome?
Right.
It's like there's a certain minimum level of data richness you need to actually make something happen.
So that's where it gets interesting.
So people plan and make sure that they don't invest like a million dollars and nothing comes out of it.
Costas, can I interrupt?
Just one quick question.
Does the data format vary by sensor manufacturer
or is there sort of like an open data standard
for say like temperature or something?
Yeah.
Luckily, they're standardized the way in which data flows.
It's basically time series data.
So people have mutually agreed upon this format where there's a
timestamp and there's a value, right?
So if you ping a sensor, all right, you'll get like a stream of data.
And in terms of frequency, again, it's customizable, but that's a good part.
If you go to like their data store and download like the last 12 months of data,
it's pretty nice.
It's timestamped and a value, right?
And parsing that has become, obviously, it's fantastic.
There are so many time series oriented databases that you can use to explore that data.
That was interesting, actually.
I mean, like that's good that we have like consensus around data formats in at least one place in the
world.
Yeah, yeah.
Probably it helps that, okay, I mean, a time series is a time series.
You have time and you have some kind of numerical value there.
The semantics of that, the syntax, much easier.
Okay, back to what we were discussing.
So, okay.
Let's say we go with instruments, we start getting like rich, rich enough data,
like all, all these things, we store them like in the data warehouse.
What is, let's say the, the lowest hanging fruits that you have seen for companies
out there to, uh, go after as soon as they have,
let's say, the first data flowing out of their facilities?
Like, what's the first thing that someone gets in return of investment from that?
Yeah, resource consumption is the lowest hanging fruit.
And the reason is, let's say that you install sensors, right?
And you made this investment.
You know, these machine learning tools
are built to enhance,
at least in this case,
enhance the work of the operators.
Now, so you cannot let people go.
That's not productive.
So apart from personnel cost, right?
Resource usage, meaning if you are a company that makes a physical product, you need to buy electricity from the city.
You need to buy chemicals from another vendor. You need to buy water, again, from the city.
So these are resources that if you consume less and still meet your throughput, nobody is going to be angry.
In fact, people are happy if their electricity bill goes down,
if their chemicals bill, if that goes down.
So the lowest hanging fruit is,
hey, we use X amount of resources to make one bottle of this beverage.
Can we use, can we reduce that bill by 20%?
Because it directly goes to the bottom line.
Like it's literally, it directly goes there.
So that is the resource consumption.
That's the first problem that gets attacked.
And within that, chemicals and electricity
are like the most expensive line items.
And also there's no objection.
If you go and present this to your entire team
and C-level, C-suite, nobody's going to object.
How dare you reduce our electricity bill?
Nobody does that.
So that's why we've seen a lot of people get excited
about pursuing resource consumption as a problem.
And what are some challenges around this infrastructure?
Okay, obviously, the structure of the data is not such a big challenge.
But what are the challenges of working with this data?
Is the volume of the data, is the reliability of the sensors?
What are some unique challenges there?
Yeah.
One would be the data standardization or lack thereof.
What I mean by that is, let's say you build a workflow
and it's supposed to get 14 columns of data, right?
And you go to, it works for one company.
You go to the next one and they say, oh, we don't have columns number seven and 10, right?
We only have 12.
So what do you do then, right?
And you go to a third company and suddenly three columns stop beaming data.
It just stops out of the blue.
So how do you handle these edge cases?
And also, again, this is a solvable problem,
but the names are all over the place.
Like company one can call it temp underscore 562 underscore ABC,
and company two can call it temperature underscore Batman.
And again, it's basically, we can do some automation to kind of figure that out or put
some people behind it to manually do it.
But again, it's not standardized.
So that's one.
Two would be a lot of the physical infra that is very critical to them.
It's located at a distance, meaning it's not in
the office and you cannot log into it remotely. So a lot of times what happens is the connectivity
becomes an issue. And again, there are many kind of advanced facilities where connectivity is fine,
but we do see this as an issue where the data just stops coming in. And to fix that, somebody has to
drive all the way to the
site, somebody from their company, because obviously they have the permissions to do it.
So connectivity is definitely a key issue. And third is the output of the workflow. Meaning,
let's say we have productized this and the workflow outputs all these key metrics. Let's say we have productized this and the workflow outputs all these key metrics. Let's say there's metrics, there's event detection, there's X, Y, and Z.
Now every company wants slightly something different.
Like, Hey, I want to detect that event.
Oh no, no.
I want to detect this other event.
So one way to solve that is to let them stitch together what they need
so that they can output, like we have many workflow tools and then the software world where you
as a developer, you go in, you stitch it together and it works exactly the way
you want, but in this world, since they're not software people, we don't
want to provide, like, you don't want to overwhelm them with like all this
freedom and options and nobody ends up using it.
So it's a bit of a trade-off between how out of the box, we want it to work out of the box.
At the same time,
we want them to have the freedom to customize.
So it's a bit of a trade-off on the product side.
So these are some of the challenges
that we face when it comes to physical infra.
All right.
And okay, we have the data.
The data is stored like in the data warehouse.
What do we do with the data?
What are the tools that we are using there?
And where ML gets into the picture?
Yeah.
In this case, the data goes through a few steps like pre-processing.
And then it goes through a model.
We extract information,
we present it to the operators. Now, where machine learning gets infused is when we have to detect
events of interest, when we have to prioritize all the different alarms, and when we have to
predict or estimate an event that's coming up soon. A simple example would be, let's say you're a beverage company
and you have 200 membranes working in series and parallel.
So raw water goes in, clean water comes out,
and you're supposed to maintain that infrastructure of 200 membranes.
Now, usually what happens is without any kind of software tooling, if you have to do it
manually, what you do is you do a round-robin, meaning you go to the first one, you check
if it's doing okay, then you go to the next one and so on and so forth.
So by the time you come back to the first one, it'll be like weeks or months.
And during that time, if it starts consuming 5x more electricity for some kind of, there
are many reasons why it starts doing that.
You won't know.
So you'll just get a giant fill at the end of it.
So if you want to detect an event of interest across these thousands of data streams, that's
where you need machine learning to show that, hey, out of the 10,000 events, here's the
event that is anomalous,
right? We detected it because it matters to you. Now go to membrane number 72 to fix it,
as opposed to doing a round drop. Another example is predicting what's going to happen. Meaning,
if you don't do anything in about 24 hours, here's what's going to happen, right? So predicting an
upcoming event is another key issue.
And this is where machine learning comes in very handy
because the system itself, the baseline keeps changing.
Meaning when you install new,
when you build a factory first, right?
Everything is new, things are great.
As you operate it, the assets degrade,
the processes, they change, the baseline keeps
shifting, and you've got to adapt to the new reality.
So retraining becomes a part of your offering because every, let's say every 15 days or
every 30 days, you've got to update your own reality so that if you call out something
as anomalous and turns out it's not anomalous, it's just a false alarm.
So that's another situation where machine learning is super useful.
So yeah, so a lot of this is centered on extracting event information and
presenting it to the right people.
Henry Suryawirawanacke... Okay.
So are we talking here about like more of let's say methodologies that they have to
do with working with time series data?
Like share with us like a little bit more about like the techniques and the ML tools
that are useful in this kind of like environment.
Yeah.
Yeah.
Yes.
A lot of this analysis, a lot of this work is centered on time series analysis.
And with regards to the tools that we end up using, I'd say when it comes to
machine learning, the current neural nets are, that's where a lot of the modeling work is centered on that.
I mean, we have different flavors of that model, but recurrent neural nets is what we end up using quite a bit because we deal with time series data a lot.
And in terms of the stack, Python everywhere, that's obviously our team's favorite language.
On top of that, there's a lot of TensorFlow.
And in terms of databases, obviously, we use Time Series databases to handle a lot of the data.
And in terms of the actual models, we built modules that do different tasks.
So detecting an event, that's a different ML module.
Predicting an event, that's another module.
Both could be recurrent neural nets trained on different data sets.
And that's what we we end up deploying out of the wild.
And we start with when we work with a new customer,
we start with the historical data, usually about two years
worth of historical data, all of two years worth of historical data.
All of that is timestamped.
It goes through a standardization step, meaning we want to convert all of that data into something that our platform can read and understand.
And then after that, we've automated the process of building the model, testing it, deploying it, periodically retraining it, monitoring it,
monitoring for things like drift, right? So these are some of the tools. And again,
along the way, we use all these different tools. Some of the tools are built in-house,
like for monitoring. Some of the tools, obviously, we use what's publicly available libraries that
have been tested out in the wild.
Henry Suryawirawanacanthamishnara. Yeah.
And how much importance like the domain expertise that the
customer has to set up the platform?
Yeah.
In this case, the setup work is we take that responsibility because
then it's software work and when you work with the customer that makes
physical products, they're not, you know, they're not software people. people so we want to we don't want to burden them with the implementation so we
take the work of setup now where the customer's domain knowledge comes in handy is when they want
to when the power users want to customize the platform for themselves like for example we have
provided that that freedom or that ability to the user. Out of the box, it works I would say a majority of the users,
they just want it to work out of the box.
They don't want to play around too much, but there are those power users who
definitely want the freedom and the flexibility to add drop metrics.
And they don't want to call anybody on the, they don't want to call us for a
simple change and that that's a fair, fair fair ask, which is why we built it.
Yeah.
Okay.
Cool.
That's all like super interesting.
It's very, to me at least, like very fascinating to identify like both the
similarities and the differences between like instrumenting the physical world
and instrumenting like software.
So it's great.
Like I couldn't stop thinking of like, maybe I should like just buy a few like
sensors and put them into my house and start measuring things and pretend that
I'm doing something important, you know?
But that gets me like to the next question, because it seems like we live like in, let's say, in a time where
I could do what I'm describing, right?
Like I could go and like buy a bunch of like sensors for a very low price.
And I don't know, like just monitor the quality of the air, like in my house, right?
It's pretty accessible, like to do that.
So where do you see, where do you see see things going when it comes to instrumenting the world out there?
And what kind of impact do you feel like this will have to, let's say, how we interact with the environment?
There's a lot of conversations conversations lately about like climate change
and like all these things, right?
So how do you see this physical instrumentation
like being part of this conversation
of how humans like us like interact
and work with like in our environment?
Yeah, the very fact that you start measuring something, you will start noticing it. You want
to improve it, right? You feel like you should do something about it. And that's what instrumentation
does is if you don't even know how much carbon dioxide you emit, it's not like out of sight,
out of mind. It just wouldn't matter. But if you
instrument the physical infra and then the data is in front of you, you'll at least know, oh, okay,
to make a bottle of water or a bottle of beverage or a bottle of ketchup, we consume these many
resources and each part of it has this associated carbon footprint. So just the fact that you know that
will make you want to think about,
oh, okay, among all these parts,
I'll attack the lowest hanging fruit first,
meaning that part consumes so much carbon
that if I just take out that part,
it can move the needle, right?
So you'll be able to identify needle movers
for your company, right? And that's
pretty much the big part of it is many people, if you're a company, if you're a publicly traded
company and your job is to make shoes or tires or beer, that's your primary focus. You'll be,
as a company, that's your responsibility to make that product and get it to customers and generate revenue.
So along the way, every other initiative falls to the sidelines.
So that's what instrumentation does is once you instrument your physical infrastructure, a lot of this will stay front and center, which will make you want to do something about some critical
problems.
Like are we consuming so much?
Consumption of chemicals, consumption of electricity.
We are, as big companies, they use a lot of water to make products.
Now once you use that, you've got to discard the wastewater, which means you've got to
treat it first before you throw it out in the wastewater, which means you've got to treat it first before
you throw it out in the lake, or else the government is going to come after you and
many people will file lawsuits. So I think just knowing what's happening within your company's
infra, I think it drives a lot of action. And that's what's happening around the world is all
the forward-thinking companies, sustainability has become part of their practice and just how they run their business.
Because not only do they get to meet their climate goals, but at the same time, they
get to reduce costs as well.
Nobody wants to have inefficient, giant inefficient parts in their supply chain.
It's the fact that instrumenting everything is expensive, it's slow.
It used to be, but now people are doing more and more of that.
Yeah, so it's interesting because we started this conversation
with the metaphor of Datadog for the fiscal world.
Do you see space there for something that could be called
the Fitbit for environment, for example, like something that can help us, let's say, measure actually like the
kind of like impact that we have on the environment and consequently on like the
quality of life that we have and like react to that.
Yeah.
Yeah.
It's many, actually, it's a very interesting avenue and many companies
are building their own versions of this product. Basically, if you run a facility, how do you keep track of its health? Health from many at consumer products, like something that you and I can just go to a store and buy versus what's available to companies, right?
So it's like, call it like, when we buy a device and we put it in the house, we call it IoT.
And the equivalent of that in the industry
is industrial IoT, like IIoT. So product experience, it's different, right? As consumers,
things just work out of the box. We get a thing, we put it here, it starts working. It's amazing.
So I think a lot of the newer companies like startups are bringing that product experience to the industrial world, which is driving adoption.
Because nobody wants the headache of sitting and figuring out the 500 steps you need to get one temperature sensor to start beaming data. that out-of-the-box product experience is, I think it's very important, especially when it comes to
installing hardware for physical facilities. So I think I'm very excited to see a lot of the
newer, new companies, young companies building amazing hardware that just, it works like magic.
But obviously, if you're a Fortune 500 company, you want a supplier that can meet all your needs,
right? A startup will build
the world's most amazing sensor, but what about the 5,000 other types of sensors that a big company
would need? So that's where there's a bit of a balancing act going on. But I'm pretty sure pretty
soon some of these companies will grow up and will be able to handle a lot of the requirements.
Like Samsara, for example, great company, they built amazing hardware,
they went public,
and now they have
very, very big customers.
So it takes time
to build that amazing product suite.
But yeah, I think
the product experience,
the sooner it comes
to the industrial world,
more people will adopt it.
Absolutely.
All right.
So one last question from me, and then I'll hand the microphone to, back to Eric.
So in terms of like instrumenting the physical world out there, like where do
we stand in terms of adoption right now?
Like, would you say that's like, it's 5%, 10%, 50% of like the industrial world,
at least out there has been, uh, instrumented or they are doing it?
Like what's going on out there?
Yeah.
We are definitely past the single digits for sure.
Like it's not, it's not 50% yet because the, the physical infra economy is mind boggling.
It's spectacularly huge.
And if you just look at it, I'm just talking about America, right? America, many companies, they're big companies, they have a lot of revenue, they have bandwidth
and time and resources to innovate, right? But many, many places around the world, they're like,
first of all, there's no budget, there's no time. There's no patience. There's just, there isn't enough money available to just innovate, right?
So I think we are still in the early innings.
At least in America, it's definitely trending in the right direction.
But when it comes to a lot of the physical infrastructure in Asia, where there's a lot
of large part of the world's manufacturing happens in Asia, where there's a lot of, a large part of the world's manufacturing happens in Asia. And
that's where we are still in the very, very early innings. Only the richest and the most innovative
companies can afford to do that. But the good thing is innovation, it always starts with
tech in the early days is expensive. You build it, you deploy it, you innovate, and then the cost goes down, the quality goes up, and then hopefully everyone else can catch up, right?
So that's where we are.
But I think that the market is so spectacularly big that even though we are in the early innings, there's a lot to do just in the US. And obviously, US, Europe, South America, Asia, it's an interesting dynamic that's playing
out.
But yeah, I think we are trending in the right direction.
Awesome.
Eric, all yours.
All right.
Well, one last question for you here, because we're getting close to the end, but it may
take up the whole time, because I have the sense that you're passionate about this.
So it seems like you started Pluto Shift in part to have a larger impact just beyond
sort of getting metrics to someone who's running a machine, operating machine.
When it comes to sustainability, that's a really big,
that's a huge topic, right? In a number of ways, it cuts across, you know, sort of multiple vectors
of society from, you know, the way that we, you know, live our daily lives and recycling all the
way up to, you know, public policy, et cetera. So I'd love to know when you think about data and what you're doing at Pluto
Shift, what do you see as the responsibility of data when it comes to sustainability? Or maybe
not responsibility, but in your mind, what's your vision for how it could potentially impact sustainability?
Yeah. Sustainability, as you said, there's so many. First of all, all of this falls under ESG,
and it's vast. There's so many things you can count under that. But when it comes to industrial sustainability, that's what
we are talking about here.
Data is a way to measure what's happening, be aware of it.
And also I think extracting information such that you can take operational
like actions on a day-to-day basis to make the change.
So what I mean by that is, if you're running a facility, a lot of the impact you want to have,
meaning in the next 12 months, if you want to reduce your carbon footprint by 20%,
let's say if you have a goal like that, that has to happen if you take action on a day-to-day basis, meaning you cannot let the pump run for like two months on 5x electricity.
You cannot let the membrane eat up a lot of the electricity.
You cannot let that clarifier eat up 5x more chemicals.
So it's about hunting down these spikes, detecting, predicting these spikes, pushing them down. That's just one example of how to impact carbon footprint. So I would say data enables humans to take these daily actions it's not like a big bang event where, okay,
I do something on this one day of the year and you'll meet the goal.
No, it has to be part of your day-to-day.
It has to be part of your processes, right?
Operations teams, they have to make it part of how they get work done.
And the data is the fabric that connects all of this.
Sure.
Follow-up question to that.
You know, one thing that's interesting when you think about having a better understanding
of the systems or the machines that you're operating, do you think that the type of work
that you're doing at Pluto Shift is driving a lot of awareness? Because I'm just trying to put myself in the shoes of someone operating a machine and the process running without problems, right?
And so like, of course, I would want things to be more sustainable, but
like my primary objective is going to be like operational excellence.
Do you see that dynamic a lot?
Oh, yeah, that's absolutely.
We see that quite a bit.
And it's a fair point, right?
Because end of the day,
if the operations team meets the requirement,
the throughput requirement,
they've done their job, right? So the person, like for example,
is a global VP or the C-level person who wants,
it's their, they have to make things sustainable.
They have to achieve and meet their sustainability goals.
But the person running the facility, running the machines, as you said, their job is to meet the throughput requirements.
So that's why to make this practical, a product shouldn't introduce a new process. Meaning, if the
operator is doing these seven steps in their day-to-day job, you shouldn't introduce a new
process because, as you said, they're like, I don't need that. I know how to do this job,
and I'm going to do it. And I don't have the luxury of focusing on sustainability.
So the way we look at it is you know you look at
the seven steps you look at the status quo how the work gets done and you accelerate one of those
parts or two of those parts or hopefully all the parts you accelerate you you make it easier better
faster for them so that to them they're not following a new process. It's the same process, but now they get to save electricity.
They get to save chemicals.
They get to save water, meet the sustainability goals.
And that's where the trick is, right?
So simple thing, their job is to make sure all the membranes are running, right?
Instead of round-robin, we'll just tell them, oh, here's on your phone.
We'll tell you, go to membrane number 17 because that's the one acting up.
They're more than happy, right? They don't want round-robin. Nobody wants to do that. They
want to know where I come into my shift at nine o'clock, what needs my attention, right? If there
is a right message with the right reason, they'll go to it. So I think that's how we are addressing this key, I would say,
product adoption element, right? So yeah. Yep. Love it. Well, this has been such a wonderful
show. Thank you again for sharing your time with us. And what a treat to learn about a completely
new world of data that we haven't talked about on the show. So thank you for bringing a completely new topic to us.
Of course.
Thank you so much for having me on the show, Eric Costas.
It's been a wonderful discussion.
I'm glad we got a chance to cover this topic.
Absolutely.
Thank you so much.
Okay, Costas, my takeaway is directly related to my big question from the beginning,
which is the standardized data format coming from these sensors.
That's just so wonderful to hear that that's not an issue.
I mean, he almost breezed over.
He said, well, it's like a timestamp and a value, right?
And everyone agreed that that's how we're going to do it.
And you just kind of moved up and I was like, man, if every, if all data were
like that, it would be, it'd be a much easier world.
So yeah.
How about you?
Yeah, absolutely.
I mean, that's like a part of the beauty of time series data is that they're
simple enough that humans can agree to like how to represent the data, which is great, but again, like
things start getting complicated after you start including other data that are
also needed, like, I mean, we talked about ERP data, maintenance data, like
all that stuff are obviously like not as simple in their
structure as time series data.
But regardless of that, like, I think I found it was very, very fascinating for
me to hear like how much progress we have done in being able like to instrument
physical processes and extracting data and reusing like the existing
stacks and technologies and methodologies
that we have in software
to make some kind of sense out of this data,
which is great.
I think it's kind of like testament of like the,
like the universality that like software engineering
might have and like how you can get like a methodology
and apply it like to many different things out
there. So that was, that was like a very interesting part of the conversation that we had.
Absolutely. What a fascinating show. Thanks for joining us again, and we will catch you on the
next one. We hope you enjoyed this episode of the Data Stack Show. Be sure to subscribe on your
favorite podcast app to get notified about new episodes every week.
We'd also love your feedback.
You can email me, ericdodds, at eric at datastackshow.com.
That's E-R-I-C at datastackshow.com.
The show is brought to you by Rudderstack,
the CDP for developers.
Learn how to build a CDP on your data warehouse
at rudderstack.com.