The Data Stack Show - 04: Relational to Real-Time with Change Data Capture with DeVaris Brown of Meroxa
Episode Date: September 2, 2020In this episode of The Data Stack Show, Kostas Pardalis and Eric Dodds talk change data capture (CDC) with DeVaris Brown, co-founder and CEO of Meroxa. Their conversation digs into the benefits of uti...lizing CDC and how Meroxa is using it. Highlights from the conversation include:Introduction to DeVaris and Meroxa (3:24)Why CDC has more traction today (6:58)How CDC is changing the way we build products (12:52)Where CDC is playing an important role (21:11)The experience that Meroxa delivers (24:42)Looking at Meroxa’s sources, technology and data stack (27:28)DeVaris’ vision for the company (37:10)The Data Stack Show is a weekly podcast powered by RudderStack. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Welcome back to the Data Stack Show.
Today we're going to talk about change data capture,
which as a concept isn't a brand new idea in terms of databases,
but is pretty interesting as far as the way that companies are leveraging it.
And on today's show, we're actually interviewing the founder
of a company who's built a product on top of the concept of change data capture.
So like Rudderstack, they are building a product that fits into the data stack.
So I'm excited to hear about the company.
Costas, do you want to explain a little bit about what the product is
and who we're talking to today?
Yeah, of course.
Today's episode is all about CDC.
As you said, CDC is not a very new concept.
I mean, as a programming pattern exists around for quite a while.
But today we are going to be talking about how CDC fits in the space of databases
and how this can change completely the way we interact with the data that the database holds.
So with the virus today, we will go through the product that they built at Meroxa,
which is actually a service that can be attached on your database
and turn your transactional database
like a PostgreSQL or MS SQL or MySQL
into a stream of data.
And that's very interesting
because it's a quite new concept
in terms of doing this on the database.
But it gives a lot of flexibility and enables, as we will see,
a large number of use cases on working with the data
that didn't happen before.
And anything that has to do as a use case
where data needs to be real-time
can be enabled with this technology,
which is very fascinating, I believe,
especially not that much for engineers, to be honest,
because, of course, engineers are aware of CDC and this pattern of turning every possible direction
as part of a software into a continuous stream of changes.
But especially for people in marketing and other departments
that they have been used so far to work with very high latencies
in terms of the data that might be from hours to days,
even weeks in some cases.
And here, I think that marketing people are going to be
extremely happy to hear that they can get access
to customer behavior that is captured on their database in terms of
like in times of like seconds or even seconds.
So we are going to go through all the some details about the technology, the product
itself.
It's a brand new product.
And yeah, let's see what Devaris has to say about it.
Hello, Devaris.
It's very nice to have you here today on this episode of the Data Stack Podcast.
Very excited for the conversation we have ahead of us.
Can we start by giving us a small background about you and Meroxa, the company that you
work for?
Yeah, yeah.
So thank you again for having me, the company that you work for? Yeah, yeah.
So thank you again for having me, Costas.
It is an extreme pleasure to talk with you.
I mean, I know we met last week under different circumstances, but I'm definitely appreciative of the opportunity to talk with you
about what we're doing at Meroxa.
So prior, you know, I started at the beginning
and kind of worked my way up.
But prior to Meroxa, I was a product manager at Heroku,
building features for developer experience.
So if you've used Heroku in the past, like, few years,
and you've used, you know, review apps or CI, CD or chat ops
or GitHub integrations, any of that stuff,
like, they came out of the work that the team I led did.
And so I like to say that the products that I've worked on
is powered Silicon Valley because before that,
I was a product manager at Zendesk as well.
And so I was on their API and apps platform.
So I know a lot of folks that kind of use Zendesk or Heroku
in their startup stack.
And then way back in the day, I was a software
engineer focused on
building
scalable systems at Microsoft.
So I've worked on
Windows Azure and
what was called Red Dog at the time.
Worked on
Hotmail. That was like my first
assignment.
But yeah, mostly, I know, man,
like every time I see somebody with a Hotmail address now,
I tell them thank you.
It's like you bought my mom a couch or something like that, you know?
So I appreciate it.
Yeah, man, that's me.
So yeah, that's pretty much what I've done in the past.
And then now I'm working at Maroxa.
And so Maroxa is a real-time data platform.
It's a service company.
We just believe that everybody should have access to real-time data infrastructure.
And it shouldn't be relegated to people who just have, you know,
huge data teams and unlimited amount of money to engage
some of these bigger software.
So we want to make it as accessible as Heroku was
for web app development.
So it's kind of what we're doing.
I'm the CEO.
I have a co-founder, CTO.
His name's Ali Hamidi.
We work at Heroku together and basically saw the twinkle
in each other's eye around like, hey, we should probably be doing this type of experience work for real-time data systems.
So, yeah, that's me in a nutshell.
That's great. That's great.
By the way, I forgot something really important.
I forgot to introduce that today we have also Eric with us.
So it's going to be three people participating on the podcast.
So, yeah, Ehrlich is also with us.
And moving forward, I mean, it's very interesting what you just said about what Meroxa is doing.
And everything, I mean, I think that's what we are talking about here is something that's very commonly known in the engineering space as CDC right like change data capture so let's discuss a
little bit more about that I mean it's some kind of like let's say buzzword or
like a pattern in development pattern that has been around for quite a while
there are products out there like the vision for example of they have been
developed for quite a while and they are getting more and more traction. But let's start with the basics.
So what is CDC and what is important and why now?
Like why we see today that there's more traction around using this kind of like pattern?
Yeah.
So, I mean, if you look at, you know, CDC says to change data capture.
And if you look at like how the world has been structured previously,
it's all been around tracking events.
And so what you would need to do is basically if I work for segment or if I
choose segment as a platform,
I literally would have to go and spend a few weeks building out these like
event plans and figuring out all the things that I want to capture in an event and then have that get sent to like segment.
And then you have like this buffet list of things that you can kind of go out
and then usually takes around, you know,
a couple of months for like this stuff to get, get done end to end. Right.
But the main things to remember is like,
it requires a change to your application to get this stuff to work.
And so, you know, what's been happening for years behind the scenes is that people are actually tailing the logs of their database, right,
and doing this in a way that it basically captures every single transaction that happens in the database.
Because, you know, whether or not I make a call to an API in some way, shape, or form, that information is going to end up in the database, right?
And so you essentially have a list, a very granular list of events and things that are happening on your system with respect to data.
So people just say, yo, change data capture as a pattern makes a lot more sense because now I can get that high-grade fidelity of information from the database
rather than having to go instrument my app for events.
Now, you're basically kind of kicking the can down the road eventually, right?
Because, you know, now I don't necessarily have to go define a schema
or any of that stuff.
I just pull in these raw events, and then I do some processing
and get dumping into, like, a data warehouse.
And now I can write SQL at that point, right? And so it really like lessens the developer time
dealing with, you know, how do I capture this information?
And so, you know, kind of what Maroxa saw
was that like, you know, at Heroku,
we use Debezium behind the scenes, right?
But we saw that change data capture
wasn't always the best way to get information out of the database, right? But we saw that change data capture wasn't always the best way to get
information out of the database, right? So you think about, or, you know, even information from
a data source in general, right? So, you know, a lot of these companies don't, you know,
SaaS companies or any of that, they don't give you access to the underlying data source underneath.
And so you need to have other schemes as to how you can pull data.
Also, with those data sources,
you have variations as to what capabilities
that they have, right?
So if I'm using anything less than Postgres 10.4,
you know, I can't use PG output,
I can't use a logical replication slot.
So CDC doesn't necessarily work, right?
And same thing on the MySQL side, right?
And so what Meroxa did, essentially, behind the hood,
is that we basically built, like, a decision tree to understand, like,
okay, if, you know, the data source meets these requirements
and we have these specific sets of permissions,
then we can use change data capture and pull from, you know,
Debezium and, you know, pull that into whatever else that we need down the road.
But if it doesn't, then we can use JDBC client
and write a SQL query there.
Or if that doesn't work,
then we just fall all the way down the pole.
And it basically just looks the same either way,
any connection scheme that we have.
And so that gives us the ability to not focus on the bugs
and the quirks that are in the underlying data sources.
We just deal with it from a platform
perspective. The main thing is
that change data capture
gives you such much more
fidelity and granularity of
the data that's flowing through
your system versus
I'm just going to lift and shift.
If I do a select all from orders,
a select star from orders, a select star from users,
I just see the end result in my data warehouse.
Versus with changes in the capture, I can see like,
oh, there was an upsert where, you know,
class says, you know, change the age of this person,
blah, blah, blah, blah, or like, you know, those types of things.
And I'm seeing those changes in real time versus just the end result.
Yeah, I thought it was very interesting.
And it's a very refreshed way of looking into how to interact with data
and how to integrate data together.
I mean, I'm coming from like the more traditional ETL,
the way of dealing with data and moving data around.
And I remember that with all the data warehouses out there,
like one common modeling problem that always existed
is how we can mix the transactional data
that we get from the database
together with the event streams that are coming
from clickstream data
or like the events that we are capturing on a website.
And the traditional way that we tried
like to solve this problem is by turning,
let's say the event streams
into something that resembles closer
like the tables that you had
coming from the from the databases but it didn't really scale that well or work that well because
you lose something very important that's the time resolution or the time dimension that you have
when you are dealing with cdc created data or like event streams in general and for me it's
surprising in a good way
to see that probably like at the end,
the solution to this modeling and data problem
is actually to turn the database into an event stream
instead of trying to figure out
how to effectively save the event streams
into a relational database.
So that's something that really excites me,
to be honest, with what we are doing on Merox.
And I'm really happy and excited to see how people are going to use it.
Let me move to the next question, which is about,
how do you see change data capture affect the way
that we develop and build products, even if these products
are like software products, or as we started using this term
more and more recently, like software products or as we started using this term more and more recently,
like data products.
So the things that we do with the data
and turn them into value inside the organization.
Yeah, I mean, I think, you know,
everybody's kind of stuck in this batch world now, right?
Which is like, you know,
just let me write a SQL query
and I'll populate a data view, right?
And so the problem is, is like SaaS tools have figured out that like real-time
and our event-based architectures can provide better customer experiences, right?
So, you know, we were talking to, you know,
I had a proof of concept with a hospitality company out of Vegas.
And you think about like a weekend in Vegas, right?
And for a VIP customer, you know, you go to MGM Grand
or the Wynn or something like that, you know,
you see the separate line for like the Pearl member
or whatever it is called.
And basically like anytime a VIP person wants to check in,
I want to send them a list of specialized offers, right?
And personalized offers.
Well, the company that we were talking to,
they're basically doing like a, you know,
daily backups and dumping that into their data warehouse.
All right.
So you think about a weekend in Vegas,
I get there on a Friday and then I leave on a Sunday.
All right.
Because that backup job takes 18 hours and fails over 70% of the time,
less than 10% of the people were actually just seeing that offer in time.
All right.
They had Braze or Interable and all these kind of like marketing automation companies installed.
But just because they couldn't get the data in a format in real time, they were losing over eight to nine, you know, figures of revenue at the end of the day, right?
And it's just like, you know, some salesperson from, you know, the Hadoop world or the Spark
world probably sold them something.
And I mean, which is the case, and it's taken them basically like a year to get it all in
the same place.
But the problem is, is like, you still need your data to get into those systems for it to actually be useful for you. And so with change data capture, you can start getting like finer grain fidelity than what you would with the database backup, right? Which is like, oh, I saw this person check in. But like, let's say, you know, you got IoT devices all around your casino or your hotel property, right? Now you can say, oh, I see this person checked in over by the slots that's close to this
restaurant.
I received this, you know, this event from Change Data Capture, right?
Or, you know, it hits the database because of my IoT devices.
And now I can basically send them an offer that's like, oh, you're close to this restaurant.
Here's a deal for you, right?
Like, you wouldn't be able to get that with my, you know, daily backup job that takes 18 hours and fails over 70% of the time.
And so these like data-driven experiences, people are just more used to this.
I mean, your end users and customers are just used to having highly relatable, highly personalized content.
And you need to have that real-time capability,
not like batch or like micro-batch.
You basically need to know, like, at this moment,
what is the most relevant information that I need to make a decision
to put in front of a customer so that they can act on, right?
And, like, that's the big shift that's happened.
So, you know, you look at all of the, you know,
the bandwidth from networking like 5G and all that type of stuff,
like you literally have a ton of data at your disposal that you can use
to provide a better end-user customer experience,
have better analysis, all of that.
And it's like this information is already sitting in your database.
You're just not utilizing it because you don't know the best way to get it out.
And so, you know, for us, it's just like,
no, we'll just provide an easy way for you to get that out initially
and ongoing so that you can just use it however you want to,
whether it's search, recommendations, marketing, automation,
you know, data warehousing, analytics, all that type of stuff.
It just makes more sense to continuously pull that stream
versus having to do these huge incremental
backup jobs.
Would you say that
it seems
like Meroxa enables
you to
get
extremely modern functionality
out of your existing
system without having to overhaul the
entire pipeline, right?
Because you can achieve real time by building like an extremely sophisticated pipeline.
But I mean, doing that at an enterprise organization is, you know, a year long effort or more,
just because it touches so many things. Is that, would you say that's a strong use case for Murata?
Yeah, yeah, it is.
I mean, like that's literally our use case.
Is that like you just point us at a data source
and we figure it out and we give you the ability
to kind of multiplex that data stream
into a bunch of different places.
And so you don't have to like rip and replace.
You can just have us alongside and just say,
yo, I want this to be my GDP gdpr ccpa data pipeline all right and it's like data flows from our database i do some
streaming processing and i put it out to the world and you know into you know my data warehouse into
a real-time api that we provision but you still have your you know production data sources all
right and like you know you don't necessarily need,
like the whole point for us was to evolve the conversation from integration
to orchestration, right?
Because we saw a bunch of data engineers,
I interviewed a bunch of data engineers and they're like, look,
70 to 80% of my time is just putting this stuff together, figuring out,
you know, how to, you know, get my Kafka to talk to, you know,
my sales force and then
put this stuff either in S3
or Snowflake or
like, just all
that stuff takes time and you need to have
expertise on each one of those data
components. But we're just like,
yeah, just assume all this stuff could
connect. Just figure out where
do you want your data to be at this point?
And like, reducing that kind of operational overhead for maintenance and provisioning,
it just allows you to be more expressive and experiment more
and just be able to add more customer value with these kind of data-driven experiences.
So as the end user, let's just take Debra's example,
just because I'm just thinking of so many situations I've faced in the past
where having that functionality would have been amazing.
But let's say I'm running Braze and I'm responsible for the personalized
offers.
Like if I'm running the personalized offers program,
does anything change for me when you enter the picture?
Or is it that I'm just now getting a real-time data feed that I can leverage?
Yeah, you're just getting a real – so a couple ways that it can change for you
is that because you can do the preprocessing inside of the stream itself,
you don't necessarily have to worry about downstream dependencies and,
and all of those types of things, right? Like it just, it just, you,
you can do, you know, your concatenation, your augmentation on the stream.
And then, so by the time it hits Braze,
you have a better way of actually like targeting your customers at that point.
Right. And so, you know,
it's just one of those things where it's just like,
because you have this real time data, you can actually, because Braze splits out campaigns into recurring campaigns and then point-in-time campaigns.
So if a user takes a specific action, now I can do this thing and opt them into these campaign buckets. So if you start thinking about it, now I can basically define and find a granularity
what that action is
and have more associated data
to say, okay, well, here's the type of experience
that I can provide them now
because I'm sending up a specific type of data strip to Braves.
Very cool.
That's great.
So I think it's pretty clear
that marketing is like a very big use case
of using real-time data.
And I think latency is also something
that is important for marketing.
Have you seen so far any other use cases
for CDC in general,
but also specifically for Meroxa
and your product?
Where else you have seen CDC playing a more important role nowadays?
Yeah, I think for us, the use cases that we see are instant data warehouse.
If I pick one of these ELT or ETL solutions off the shelf, I essentially have basically, you know, I'm kicking the can down the road to get this information to my data warehouse, right?
And so I have to basically just run these like batch jobs.
And so it just kind of runs, you know, you kind of run into race conditions depending on how big your data sets are.
And so for us, the main thing is people are just like,
I just want my data warehouse to be an accurate reflection,
a real-time accurate reflection of our entire data picture.
And so that's the number one use case that we have.
Other use cases that we have is like you you know, you think about a platform engineer
that uses data, right?
So, you know, data warehouses are not transactional.
They're absolutely terrible for that.
So, usually what I do is I write some SQL queries to build data marts, and then the
platform engineer basically builds an API on top of those data marts for like, you know,
dashboards or, you know, whatever you may need.
Well, what Maroxa does underneath the hood essentially is like
we give you the ability to point your data stream to an API endpoint.
So now you essentially have gRPC, GraphQL, or RESTful API
that is consuming this real-time stream.
And so you think about like real-time compliance at that point, dashboards,
real-time search indexing, right?
Like, I can point that stream over to, you know,
my Elasticsearch cluster, my Alveolia cluster,
and continuously do these things, right?
Like, you think about this from an e-commerce aspect,
hey, I might want to have real-time inventory, right?
Like, you know, you go to a website,
and you're like, dang, I really like this click.
And then when you click add to cart, it just says, oh, sorry, it's out of stock right now,
right?
Like, you know, that's just a terrible customer experience to have.
And so having real-time data kind of facilitates all of these interactions.
And I think that's the whole point of change data capture, right?
It's like, you know, you start pulling these changes in real time,
you do some little string processing,
and it gives you the ability to have an accurate reflection
in any way, shape, or form that you want to have
your data take place inside of your application.
That's the real power.
And unfortunately, like a lot of these other tools,
they'll pull stuff on regular intervals,
but you'll end up in like, it's 30 it's 30 minutes, I believe is like the default
standard for pretty much everybody else.
And if you want to pay extra two grand, you can get, get it down to five minutes.
But then it's like, at that point, it's like, okay, well, a lot can happen in five minutes,
right?
Like, especially around Black Friday or like some sort of sale or something like that,
right?
Like, you just want to make sure that you have an accurate reflection.
And that's, that's really the, the, the, you know,
the benefit of using change data capture.
Yeah, that's great.
So I think it's also like a good point right now to discuss a little bit
more and get some more information from you about the product itself.
So can you, you know,
help us understand a little bit better how's the experience with the
product itself? I mean, what someone can do? What is your user? First of all, I mean, I assume it's
probably someone with more technical background, but who is the user of Meroxa? And what kind of
experience you deliver to them? Yeah, I mean, our users are data engineers for right now. Data engineers are data
aware engineers. Like, you know,
an engineer that kind of knows system design
that might get assigned, you know,
the duty
of making pipelines
for a sprint
or two, right? Like,
we want those people. And we're focusing
on SMB and mid-market specifically
because at the top end of the market,
I mean, you got a ton of folks, right?
Like Nexla, Ascend, StreamSets, Confluent.
I mean, like there's a ton of people doing that
as well as the open source community, right?
Like, so you got, you know,
Netflix putting out open source stuff
and all these things.
Google, all the big substrates have their own vendor-specific way that you can kind of do these things.
And I think the real reason we saw this was like, that one or two person team and making that experience super easy for them to build this infrastructure and not really have to think about it. or do I just like buy one of these off-the-shelf solutions and then spend like a million dollars in six to 12 months building it
or do I just pick up something like Maroxon
and it's like as fast as I can type a command,
I have the same thing that like Netflix and Slack
and like kind of all these bigger companies are using behind the scenes, right?
And so like that to us is kind of our advantage, right?
Like we're decidedly
focused on the smaller folks so we can build a you know build a community around real-time streaming
right because a lot of people don't even know that like this is possible for them to do right
it's like the first thing they do is like oh i'm just going to pick up segment and that's going to
give me the ability to you know kind of get these events and all that type of stuff and it's like
yeah kind of right or or rudder stack right like you know they can do get these events and all that type of stuff. And it's like, yeah, kind of, right?
Or, or RutterStack, right?
Like, you know, they can do that,
but it's like, that's also is a huge engineering leap
for you to start instrumenting your app
and like thinking about all the events
and, you know, making sure that you have
your metrics playing in order
because you're basically going to have to end up
doing ETL anyway, because, oh man,
I forgot to put this field in the user login event.
Like, you know, like that type of thing, right?
Like that happens all the time.
So it's just for us, it's just kind of like, it's just an easier way for us to attack that
kind of SMB audience.
And that's what we really want to go focus on.
Makes sense.
So what kind of sources you're supporting right now?
What kind of technologies I can turn into a stream of
changes that I can capture?
You can turn an
API endpoint into a
stream.
Then we have a list of databases.
It's
Cassandra, Postgres, MySQL,
Mongo,
Oracle, and I think MS
SQL is on the way.
And then you can point it to basically like a URL
or web hook address, right?
And so you can do that.
Kafka itself, you can point us to a Kafka stream,
S3, file stores, GCP, all the AWS stuff, GCP stuff.
We got basically kind of in Snowflake, right?
So we're kind of creating that world.
And how can I access the data that have been,
I mean, these kind of streams of changes,
like how I can interact with them and how I can access them,
and most importantly,
how can I integrate this in my product?
Yeah, so basically the way that you interact
with the data is you have a few ways.
So you can basically underneath the hood,
we go from CDC into a Kafka cluster.
So you can actually access the raw topics if you want to, right? Like it's just Kafka underneath the hood, we go from CDC into a Kafka cluster. So you can actually access the raw topics if you want
to, right? Like it's just Kafka underneath
the hood. And then you
have the ability to query.
So you can basically
with any resources you provision, we
provide a slash query endpoint.
You can write ANSI SQL to query
that data. And then we actually
expose that stream
eventually, you know, if you want to create an
API endpoint, right? Like, so we auto-provision
the API endpoint for you, or you
just dump it into, like, an S3 or file
storage, and you can use, you know, kind of
like Presto or Athena to go
through your data lake, or you can put it
into a data warehouse, whatever one you pick.
You know, we support Redshift,
BigQuery, and Snowflake,
and so you can just use SQL there as well.
And then there's also an interesting way, too,
is if you want that output to be code,
so we expose functions as well.
So you can point the string to a function,
and then that can infinitely do whatever it is
that you need to there as well via code.
So you have a few different ways
that you can access that information.
Okay, that sounds great. So you have a few different ways that you can access that information. Okay, that sounds great. So you mentioned a few things about the underlying
technologies that you are using to build the product. Can you go a little bit deeper into
that? Like what kind of technologies that you are using? What kind of frameworks? You mentioned
Kafka. At some point, you mentioned the Bayesian. I don't know if this is also part of your stack.
Can you give us an idea of what's the stack like behind Amaroksa?
Yeah, definitely.
I mean, all this will be open source in a couple months anyway.
So the real magic that we found was not in the data plane.
We just basically have a curated set of open source projects
where we know it can function on this job.
Our IP is in the control plane, like the puppet master.
So all the scaling, maintenance, and all that type of stuff,
that's our IP.
But we basically used Debezium into Kafka.
And then the reason why we picked Kafka
or an event streaming format,
twofold. One, Ali, my
co-founder, is one of the world's foremost experts
in Kafka. And so
at Heroku, the team was like engineering
team for our department
of data, especially on the real-time serving
aspect, was about eight people.
And they did...
I always get this wrong,
millions of Postgres databases,
thousands
of Kafka clusters
for tens of thousands of customers
and hundreds of,
excuse me, hundreds of millions of
requests per minute. And if the team
was eight people, right?
And so, you know, it's like we have
this expertise, but also Kafka so, you know, it's like we have this expertise,
but also Kafka basically allows you to shrink the footprint of like building connectors, right?
And so now we don't have to have a specific
like Salesforce to Braze connector.
It's just everything talks to Kafka anyway.
And so now it's just our duty to get information into Kafka and out of Kafka.
And so, like, that's really what we do underneath the hood.
And that's really it, man.
I mean, you know, we just know how to run this infrastructure better than anyone else.
You know, you talk to anyone who's running Kafka, the number one thing is like,
damn it, I got to, you know to learn how to tune the JVM,
got to deal with Zookeeper, all this stuff. And it's kind of like, yeah, we figured out how to
do this at scale. So, you know. Yeah. That makes sense. I mean,
managing Kafka clusters is an interesting experience. I mean, I had like this experience also,
like in my previous company,
that we had like built the product around Kafka and it's a very powerful technology,
but there are many moving parts there
that you have to have right
if you want to operationalize it
and make sure that it's always there
and working as expected.
Yeah, it makes a lot of sense.
So you mentioned that you're going to open source your software.
So that's like an interesting question that I have is,
how do you think about open source?
How important do you think it is?
And why you're actually thinking of open sourcing your technology?
Yeah, I mean, look, we're system engineers before software
engineers, right?
We want to stand on,
I mean, we are standing on the backs of giants,
right? So, you know,
open source is
vital and crucial, is like
the life's blood of what Veroxa is
and to be a good citizen in this
space, we not only
have to leverage these things,
but we also have to contribute back, right?
And like, we fully recognize that, right?
Like, we're not like some of the other folks
that are just like, ooh, interesting open source project.
Let's just commercialize that
and basically throw them the middle finger
in the rear view mirror as we collect billions, right?
Like, that's not what we want to be.
And so for us, it's just more so like,
we know that, like I said before, you know, the IP isn't necessarily in the components themselves.
IP is like how you stitch those things together and run a platform and operationalize it, right?
And so if we make a change, like, you know, we put these things together.
I'll give you a perfect example, right?
Like, you know, we built all this stuff,
and then, you know, our first connector was Redshift, right?
Redshift, the Kafka Connect plugin for Redshift,
only does single-world inserts.
So if you imagine somebody that has a
gigabyte's worth of data doing
single row inserts at a time,
it's probably going to take you a while.
And so what we did was
essentially like, you know,
we forked the Redshift connector,
added the ability to do multi-row
inserts, and then
just contributed that backup string.
And like, that's something that
you know, a lot of people
at scale, you see these
kind of like limits, but you know, the
general public is like, oh, I'm just going to
take data from a database and put it into
Redshift, and it should be fine, right?
Like, these are the types of things that we see
that, you know, we'll leverage
the community first, but where it doesn't fit our
customer needs, we'll, you know, we'll leverage the community first, but where it doesn't fit our customer needs,
we'll, you know, basically do some software engineering
and then contribute the changes back to the ecosystem.
And so for us, like, open source is a strategy,
like our go-to-market strategy,
because honestly, like, we see at the end of the day,
like, there really is no expertise.
Like, what is the 12 factors for data?
Like, where's that expertise
at to say, like, oh, in real-time systems,
this is how you should be thinking about things.
This is how you should connect things. If you're
running, like, microservices, this
is how you should be architecting, you know,
your real-time data. Like, there
isn't that just general consensus.
Like, it exists in the big companies, right?
Like, at Google and Facebook and Netflix and Uber and all that type of stuff. But that general knowledge, unless exists in the big companies, like at Google and Facebook and Netflix
and Uber and all that type of stuff.
But that general knowledge,
unless you know somebody that knows somebody
that has done this before,
it just really isn't available.
So we want to basically use open source
to democratize not only the access
to this kind of power,
but just the education as well.
You should be doing things
better than what you're doing, even if you are one of these big companies. Yeah, yeah. And I think
it's a very good point what you said about education. I think education is a very big part
around building this kind of products. But okay, nobody actually has prior experience, right?
It's not like a toothpaste. It's not like a can of binge or whatever. I mean, it's something that, okay,
like you build it, you put it out there,
there is value, but okay,
people also need to understand like why
and how to use it.
And I think that open source,
like it's an amazing tool to actually
accelerate the process of educating
the people out there how to,
and democratize the knowledge at the end.
Like let all the engineers out there how to democratize the knowledge at the end.
Let all the engineers out there do the best that they can because that's at the end what they want to do. That's great. Moving to the last part of our conversation, just a few questions around
the company itself. How long have you been out there and what is the current status?
I mean, are you hiring?
Yeah.
I mean, we've company was incorporated in October of 2019.
Our first commit was January 9th.
And so we are hiring. We'd love to find a senior back-end engineer that's focused on Go.
So if you're that person, I'm devars at maroxa.io.
We would also love to find a front-end person, mid-level, junior, doesn't matter.
But yeah, we are hiring.
And one of the things I would like to say too is that we are very intentional about building an inclusive culture, an inclusive team. So at least 25% of our company will be women. And so that's something that I am very steadfast in,
in making sure that,
that we are creating an environment that is not going to perpetuate kind of
the traditional Silicon Valley tropes. We are remote first,
so you can be anywhere and everywhere. And then the other thing too,
is, is like, we, we, we're adults.
So me and Ali have been in startups before.
And so the things that we know people care about is work-life balance.
Everybody has this idea that startups are just absolute chaos all the time.
But planning and focus execution is something that can help alleviate that.
So, like I said, we're adults.
We've been there and done that before.
And then we pay people like adults. So,
you know, we've, you know,
we know that, you know, especially
at the senior realm, there's, you know,
Netflix is giving you a million dollars
and, like, Facebook is starting to, you know,
Google is starting to, you know, kind of kitchen sink at you.
But, like, you know, we raise a little bit of money,
so you'll have more of an impact,
and you'll be paid commiserately
for your level of contribution
and the impact that you can create.
Yeah, yeah, yeah.
Also experience is an important aspect
of choosing to
join like a company at this early stage i think the experience of being part of
such a team and like seeing how what it means like to grow fast
and how you can grow together with this team i think it's amazing
an amazing experience and asset for everyone and
yeah especially when the people from this company they're like
great people like you so
appreciate that man yeah I mean
we got great investors
Amplify who's you know
who's invested in folks like
Datadog and Prisma and
like you know all that
RootVC who's traditionally
a hardware focused
company but the partner that
we're working with, Lee,
who led our round, was VP of engineering at Teespring.
We got the Looker co-founders on our cap table.
So Ben and Kenan, Nick Caldwell,
who's the product officer at Looker, is on our profile.
Jason Warner, who's the CTO Looker, is on our profile Jason Warner, who's the CTO of GitHub Is in our cap table
Adam Gross, who's the old CEO
Of
Heroku, who's on our cap table
Fred, who's the CTO at The RealReal
As well, so like, we got a community
Of folks, Dion
Nicholas, sorry Dion
You know, if you hear this, my bad man
Every time I get like a long list of things I always forget like Deion, Nicholas, sorry, Deion. If you hear this, my bad, man.
Every time I get a long list of things, I always forget one or two.
I mean, it's just like we've got an all-star group of people around us that really are technical and want to see us succeed.
I mean, Chris Riccometti over at WePay just came on the cap table.
He did Apache SamSA, right? Like, you know,
we got Jesse Peterson also is the head of data at Autodesk as well. Like, you know, he's on our
cap table. Like, we have so many investors and advisors that have just like been there and done
that, that just know like, yo, you might want to stay away from this because it's like you've seen this
happen before or yeah that's a great idea right like you know it's just one of those things where
uh it's so exciting to have i don't know about you like you know when you started your company
right like you know all the experiences that i've had all the relationships i've made have just
basically kind of paired me for this moment right now. It was just like the time where I was like, yo, I need help. Everybody was just kind of like, yeah,
we're going to give you, you know, time, resource, money, whatever it is to make sure that this is
successful. And so all of that is like an input to the company. And then now like all of the,
you know, good or bad experiences that Ali and I have had as far as startup life, we just bring
in our design in the company that we want.
So I think that's the big difference between us and a lot of these more immature startups
is that we just understand.
We get basically, we get recruiting now and we send out surveys at the end of our recruiting cycle,
as well as a gift card. Right. So we go parent exercise. Right.
And it's like, look, we'll just pay you for your time.
We know you're taking three, four hours out of your day from your job.
Right. Like why,
why wouldn't we want to just pay you for doing something that's meaningful
work? Right. Like, and then on our surveys, you know, you know everybody's like yo even though i didn't get the job this is one of
the best recruiting experiences that we've ever had because me and ali literally took a few days
to say okay let's think recruiting experience that we would have loved to have right and it's
reflective you know kind of in you know uh towards towards the audience that we're going after so
it's just like all these little things, man.
And it's just like kind of preparedness
where we're going to be successful one way or another.
Yeah, yeah.
I think based on my experience,
I mean, to have a good opportunity,
you need like three main components,
the three main ingredients.
One's like find the right timing,
have the right team and be in the right place.
And it sounds like you have all three of them.
So yeah, good luck with and I'm pretty sure
that one way or another, as you said, you're going to be
successful. So that's, that's great. Okay, like, last question
from my side. What's about the name? How did you came up with
the name? And also, if you can serve some interesting facts
about the company,
something that's not well known
and something that you think it's interesting to share with the people out there.
Yeah, the name is actually pretty...
Most people don't get it.
They just think it's some like made-up Silicon Valley name.
But it's like a real deep story.
Not real deep, but it's kind of an interesting story.
So I was watching... So me and Ali both came from Salesforce real deep story uh not real deep but it's kind of an interesting story so i was uh up watching so
me and ollie both came from salesforce and mark benioff used to say data's new oil all the time
right like data's new oil data's new oil and so one night i was up watching national geographic
and they were talking about the dangle tape pipeline that's getting built in west africa
and it's going to be the largest oil refinery pipeline in the world kind of thing. And one of the byproducts is jet fuel.
And the way that you remove impurities from jet fuel is the Mirox process.
All right.
And so I went immediately was like, yo, that's the name of our company.
And then basically like everything Mirox was sold out.
So I just basically added an A on the end of it.
So it sounds like either like
a pharmaceutical company or, you know, Silicon Valley official, but, you know, kind of our
unofficial tagline is if Davis is new oil, we want to power the refinery. So that's how
we got Maroxa.
Yeah, that's great. That's great.
Very cool. What a great story. Love that.
Yeah, man. Everything has a meaning here, right?
Like that's what I said.
This is a purpose-built company.
Interested fact about us.
I would say for all of the Kafka users out there,
we feel your pain around Kafka Connect,
and we will be open sourcing
our Maroxa Connect
that's going to be written in Go.
And so now you don't have
to worry about using the JVM.
You can just deploy our Maroxa Connect
thing in Kubernetes or
whatever it is that you have.
It's automatically compliant
with the Kafka API subset
and the Kafka API subset.
And so the Kafka API spec, so it's just basically just works.
Instead of the gigabyte of memory that you need
for your JVM Kafka Connect,
our instance uses, I think it's like eight megabytes
footprint.
So now you can just deploy it on Kubernetes
and like get automatic scale at that area.
So, and then we are, yeah, Now you can just deploy it on Kubernetes and get automatic scale at that area.
And then we are...
Yeah, so that's going to be coming out in October, November when we open source some of these data plane components.
So I think that's the coolest thing that we have so far.
Yeah.
I have a feeling that as time passes,
you'll have more and more interesting facts to share about Meroxa.
So thank you so much for today.
And I'm pretty sure that we'll have the opportunity quite soon to discuss again
and share more interesting facts around both the company and experiences around working with data.
Thank you so much, Devarish, for today.
Yeah, man, thank you very much. Thanks for having me.
So that was it. I really enjoyed the conversation with Devaris.
It's very interesting to see and hear from a person
who is very dedicated and excited to build a new technology.
I'm also excited because, as I said,
it's a very interesting concept
to see CDC being applied and on scale on databases.
And as we heard from him,
there are plenty of new use cases that have been enabled.
And even like the company is pretty young,
like it's a couple of months,
but still, I mean, they have came up
with some very interesting use cases
that you shared with us.
The technology is great.
I mean, there are many different open source technologies
that are incorporated there from Kafka to Debezium.
There are some folks there that have huge experience in
interacting with all these
products.
And I'm sure that in the future we will have
the opportunity to discuss
even more about the technical aspect
of building a
product like Meroxa.
And
yeah, I mean, I think
it's just the beginning.
I'm pretty sure that from now on,
we will hear more and more about products
that are around real-time streams of data.
And I'm really looking forward to meet again with the virus
and hear what they are going to be building
like in the next couple of months.
Me too.
I think, you know, one thing that we see is a trend in data engineering and really sort
of related to customer data in general is that the warehouse is, it's always been a
key part of, you know, the data stacks that companies build, but with tools like Meroxa,
who productize CDC in ways that really can be game-changing
for other parts of the organization.
It's been interesting to see this trend of the data warehouse
sort of rising in ascendancy to sort of be the king of the stack again,
which is pretty neat.
And of course, that's driven by a number of things,
you know, new database technologies.
Also interesting to see how Meroxa came out of Heroku.
So product managers helping build features
and noticing that there's a big need in the market.
So need to see how DeVarious and his co-founder took big need in the market. So need to see how the various and his co-founder
took advantage of that as well.
And we probably need to circle up and have
sort of a good, bad and ugly related to Kafka
because I don't think we dug too deeply into that,
but they obviously have some strong opinions
and we know that there's strong opinions out there
on both sides with Kafka.
So maybe we can ask
his co-founder to come back on
and have a Kafka showdown discussion.
Absolutely.
I think we are both
looking forward
to discuss again
with the guys at Meroxon.