Drill to Detail - Drill to Detail Ep.68 ‘Confluent, Event-First Thinking and Streaming Real-Time Analytics’ With Special Guests Robin Moffatt and Ricardo Ferreira and Special Host Stewart Bryson
Episode Date: June 17, 2019In this special edition of the Drill to Detail Podcast hosted by Stewart Bryson, CEO and Co-Founder of Red Pill Analytics, he is joined by Robin Moffatt and Ricardo Ferreira, Developer Advocates at Co...nfluent, to talk about Apache Kafka and Confluent, event-first thinking and streaming real-time analytics.Confluent Download: https://www.confluent.io/download/Demo: https://github.com/confluentinc/cp-demo/Slack group: http://cnfl.io/slackMailing list: https://groups.google.com/forum/#!forum/confluent-platformFrom Zero to Hero with Kafka Connect: http://rmoff.dev/ksldn19l-kafka-connect-slidesNo More Silos: Integrating Databases and Apache Kafka: http://rmoff.dev/ksny19-no-more-silosThe Changing Face of ETL: Event-Driven Architectures for Data Engineers: http://rmoff.dev/changing-face-of-etl
Transcript
Discussion (0)
Hello everyone, thanks for joining us for another episode of Drill to Detail.
I'm your guest host, Stuart Bryson. Those of you that have listened to the show in the past
may know me as a recurring guest. And Mark has taken the day off from this recording
for a couple of reasons. He wanted me to step in and do a guest host and also the subject matter, which is Apache Kafka.
And the Confluent platform is one that he knows I'm an avid fan of.
So here you are.
You've got a yank recording on Drill to Detail.
So here we go.
We've got two great guests from Confluent who are joining us to discuss Apache Kafka and the Confluent platform.
First up is Robin Moffitt. He's a developer advocate at Confluent. Robin, why don't you
tell us a little bit about yourself and what you do there as a developer advocate for Confluent?
Sure, thanks. Thanks for having us on the show. So yeah, I'm a developer advocate,
which is, it's a really cool role. It's something I massively enjoy.
As the name implies, it's advocating for developers,
both kind of to them
and explaining how technology can help them,
but then also advocating for them back internally.
So taking feedback from developers,
acting as a developer and working with our products
and kind of feeding back to engineering and product and so on
about certain directions or functionality within the software itself.
So it's a lot of fun.
So it's writing blogs, doing talks,
working with developers in the community.
All the stuff we used to have to try to find time for
instead of it being our main job, right, Robin?
Yeah, and it's funny because it's the kind of thing
which I never knew actually existed.
I didn't realize that was kind of a job,
a profession that you could do in and of itself
until my now boss told me a couple of years ago
at OpenWorld, hey, go and have a look at this conference track.
And I went and sat in on it.
It was a whole bunch of talks all about developer relations.
And it's like, oh, wow, this thing actually exists.
So it's, yeah, it's really cool.
Yeah, it's so important today in the time of the cloud
because, you know, things are readily available,
but you're not really sure what exists
or what's the best route to getting to know them.
So Ricardo's been quiet as Robin and I've been bantering on.
So Ricardo, why don't you step up, tell us a little bit about yourself and what you do
there at Confluent as well.
Sure.
Thanks, Stuart.
Well, first of all, thanks for having me as well.
That's going to be my first joint in these episodes.
I'm also developer advocate at Confluent and like Robin explained it very well, our job
is essentially
to make sure developers know a bit more about what they can do with Kafka and Confluent platform,
as well as to help them to bring their struggle and complaints to our engineering teams and make
sure we always come up with a better technology. Before I joined Confluent, it happened very
recently. I'm one of the youngest developer advocates on the team.
And I was working at Oracle.
I spent eight years there.
And the funny story is that everybody asked me,
oh, you work at Oracle, so you must know a lot of databases
and ETL and all that stuff.
And I always tell them, yeah, but do you know what?
I don't know anything about it
because my background was more focused on midware
and integration technology.
So I just kind of happened to work at Oracle,
which happens to work with some other technologies too,
although their baby is definitely the database.
So Ricardo, what do you think that says about,
you know, the three of us on this podcast today discussing Apache Kafka and the Confluent platform?
We all sort of have our histories in the Oracle world.
Is that just because Oracle was that prevalent in almost all spaces?
Or is there something specific about that background?
What do you think that leads us into some of these more modern technologies?
Yeah, I would agree that it has to do with the fact
that Oracle was kind of a prevalence everywhere,
especially if we go back to 20 years ago.
And they definitely had a finger on how
do we manage and process data.
However, and like any technology, things evolve, right?
So we are always looking for better ways to do things and more efficiently and more effective.
And I think the future kind of has a different plan for what we used to do 20 years ago.
And one of the things that we are seeking here is, especially at Confluent and everywhere,
is to manage data in a more stream processing way.
So yeah, I think Oracle had something to do with our history here.
What a great lead-in to the next question.
So you both discussed the Confluent platform.
And Robin, why don't you step up and, you know, there's Apache Kafka,
there's the Confluent platform.
Do you want to, you know, take our listeners through
just a little bit of an overview of what those
are and what the differences are?
Yeah, sure.
So I guess pairing it right back to kind of what Kafka is, because quite a lot of people
have heard of it and not everyone fully understands what it is.
So Kafka at its very, very heart is this idea of a distributed commit log.
And again, we can talk about logs and unbounded streams and stuff like that in a moment.
But taking that as a given, it's this distributed system that enables you and acts as an event streaming platform.
So it's got integration APIs.
It's got stream processing APIs.
And it's a project from the Apache Software Foundation. And around that,
there's Confluent Platform, which builds a bunch of pieces on top of it and gives you the tools
and technologies that you need to actually build and deploy projects and systems around Kafka
itself. So an example of that would be some of the connectors for enabling you to hook it up to databases, to Elasticsearch, to HDFS, to S3.
It gives you a schema registry.
So we can talk about that in more detail if we want to.
But being able to actually give you a way to kind of store and govern your schemas and your pipelines.
It gives you ksql, which is super important.
And I guess particularly to our audience here, super interesting given the SQL nature of it.
And then you've got things like monitoring tools and development, web UIs.
You've got a whole bunch of stuff in there which makes up Confluent Platform.
Fantastic. Anything that you'd like to add on there, Ricardo?
No, I think Robin's explanation was definitely
both comprehensive and perfectly right. As usual. So moving on. So Robin, you discussed
stream processing there for a second. So I remember when I was first looking at Apache Kafka,
and it was always the discussion of the distributed commit log, which still rings
true, I think, and I wouldn't mind getting your opinion. But it's more than that now, isn't it?
I mean, what even Apache Kafka has on top of the core Kafka and what you guys at Confluent put on
top of it is all about this stream processing paradigm. Do you want to just talk to our listeners about, you know,
what does that mean, stream processing?
And if somebody hasn't experienced perhaps batch processing
or real-time processing,
what is really inherently different about stream processing?
So I think if it's all right, I'll answer a slightly different question and we can talk
about stream processing but the kind of the the bit to understand first is around the events
and this idea of an event room platform because is that all right i kind of yeah let's do that
i'm feeding the questions and the answers but the the stream processing bit makes sense but i think
one of the greatest not mistakes but things that kind of took me a bit of while to adjust to when I was learning about Kafka from this world, this kind of like back in 15 years of working with batch systems, is that Why would I care about it? And it's quite easy to dismiss it because of that.
Whereas what Kafka gives us and why Kafka is so powerful
is this concept of event-first thinking.
And events are actually what power a great deal of the data that we work with.
So events enable us to model the real world.
So in the same way that when we're building data warehouses,
we aggregate data up and that kind of makes things nice and fast to perform with.
But once you've aggregated it up, you can't go back from there.
So you've got your weekly summaries, but you want to know your daily stuff.
You have to have retained those base figures,
otherwise you can't go back from it.
And in the same way, events are our raw data. Events are
actually what happens. And from events, we can aggregate up. We can create states, we can
determine what happened from those events. But unless we actually capture the events, we lose
some of the fidelity of the data. So Kafka acts as this event streaming platform that lets us capture events
and model events and do stream processing on events as well, which is why this answer kind
of comes before the next bit when you ask about stream processing. Because Kafka is this, it's
not only a distributed commit log, it's also an immutable commit log, which means you can't go
back and change it. So something happens and then something else happens. You can't go back in time and change
things. You might wish you could have done. You might wish that we kind of like started recording
and whatever, but sometimes things happen and then you have to kind of, you want to go and
change this. You actually, you can't do that if something's immutable, but because it's immutable,
that gives it great powers for reasoning about what you've got within it. So Kafka is this immutable event log,
something happens, something else happens. So to give it a kind of an idea around that,
if you think about an online website, you're kind of, you're placing an order on that website,
the traditional point of capturing that data, certainly from an analytics
point of view, and probably just from an application point of view, is what's in the basket?
So one places the order, what's in the basket?
And we start to analyze the baskets and say, oh, people buy this, people buy that, people
buy whatever.
But what we lose in that are, well, how did they make that basket up?
What went into the basket?
And did they take
things out and change it? And all those events around, they put some baked beans into the basket
and then they put some bread into the basket and then they took the baked beans out and they put
tin spaghetti in. All of those different things you lose if you don't capture the events.
And some people listening to this will say, oh, well, that's fine. We can also
capture those things and we can write them somewhere else and store those in a database events. And some people listening to this will say, oh, well, that's fine. We can also capture
those things and we can write them somewhere else and store those in a database because we'd want to
analyze it. But that misses the point because then you're building in that bit specifically.
Whereas if you're capturing the events, you get all of that for free. And then you can decide,
well, we don't want to know the individual bits. We can just roll it up and you can roll it up,
but you can never go backwards from that kind of the initial capture.
Yeah, that's a great explanation, Robin.
And it does mirror my experience as well,
trying to come up to speed with what you've described,
which is the concept of what an event is,
having spent so much time with relational databases
and looking at the layout of the data as it's recorded
or as it exists today and not necessarily the tiny bits of structure that caused it to build
up to the point it is today. And I think that's what you're talking about. Ricardo, is that a
similar experience? I don't believe you came from sort of the data warehousing background that Robin and
I did.
What's your experience with events and what that means to Apache Kafka and the Confluent
platform?
Sure.
Yeah.
No, definitely.
I hadn't came from any data warehouse or database background.
But the way I like to look at string processing, and that's how I think most
people these days could agree with, is that about two aspects. The first one is, like any technology,
I think we always kind of focus on what we can do in a given point of time in history. Like,
for example, if we go back 20 or 30 years ago, we had this concept of database very well established and pretty much every developer,
DBA or database architect were thinking in two ways to process data. First, we have to acquire
and store, period, right? And then we could come up as developers with processing and applications
that would query the data that is stored fundamentally
and bring it to memory to start processing. So if you think about it, it's a two-step process.
It takes time. It introduces bad latency, as Robin liked to explain on his presentation,
which is pretty good. And we've spent the last 30 years doing things like this pretty much because
this is how the database technology works.
I mean, they were meant to store data and process data later, period, right? But then something
starts changing. And that something is the need for some companies and organizations these days to
not only to store, acquire, and store data in a given point in time, but at the same time,
not one day, not one week later, not one month later, but at the same time, not one day, not one week later,
not one month later, but at the same time to start processing it as well.
And the need for this is for giving proper near real time insight or to come up with some actionable insight that would change some outcome of the business.
And that is, I think, is the heart of string processing.
So it's two key pieces.
The first one is the evolution of the technology.
So coming up with new type of databases, let's call streaming databases, that are able to
store data and process data as the event is in motion to feed the use cases that people
are kind of looking for more frequently these days,
which is, take for instance, Uber.
If you think about Uber, it's all about bringing static data,
the information about the passenger and the driver,
as well as the data that is in motion, such as their position and GPS position,
and blending them together in such a way that you can actually come up with,
hey, so that means that the driver is two minutes away from me. So that's the type of motivation for string processing. And that's the way I like
to see. And pretty much, I think everybody will agree with that. That's the future about how do
we see and process data. That's great, Ricardo. And I think that leads in well to our next
discussion, which is, I wanted to talk about when you start to think
about a new platform or a new piece of software, it's difficult to just inject that into a current
environment. And I think that's where Kafka Connect comes in. I think one of the reasons that it was
so easy for me personally to get up and running with Kafka is it was so easy to get data into it
from a bunch of different systems. Robin, do you want to talk about what Kafka Connect is and why, in my mind,
it's a big differentiator? And maybe you could tell us a little bit about what that means.
Yeah, sure. Kafka Connect is one of my favorite bits because it brings in my previous experience with databases into my passion for Kafka, because it's part
of Apache Kafka and it acts as an integration API, basically, streaming integration, both
with systems upstream, where you want to pull data, you want to stream data into Kafka,
and also for taking data from Kafka and pushing it out to other places.
So, for example, you've got a bunch of data sat in a database,
in flat files, on message queues, and you want to get that into Kafka.
Maybe you want to get it into Kafka because you then want to push it down somewhere else. So just building a pipeline, maybe doing some kind of database offload for analytics,
but also for getting data into Kafka to then drive event-driven applications that want to respond to something happens somewhere else and we want to be able to respond to that.
Or you want to do some processing on it and use something like KSQL to actually build stream processing applications against this data.
So Kafka Connect's actually dead easy to use because it's just configuration files. You say, I've got data in this place over here, bring it into this topic, or you've
got data in this topic here, push it out to that place.
And there are hundreds of different connectors.
There's connectors from Confluent Platform, there's connectors from software partners
like Oracle, there's also connectors from the community.
So you find the connector for your particular technology, whether it's a database, whether it's Elasticsearch, whether it's Influx, whether it's whatever technology,
and you simply plug that into Kafka Connect and set up the configuration file and off you go.
So that was a great introduction to Kafka Connect, Robin. Are there other ways that,
or other integration points or other ways that people might get either data into Kafka
or out of Kafka? Yeah, sure. So Kafka Connect is definitely what people use where it's kind of like
it's a given existing technology. So I want to plug it into a database. You use Kafka Connect.
You definitely don't want to sit there writing your own programs, pull data when there's kind of
that wheel exists already. There's no need to go and reinvent it.
But you also see where people have more bespoke systems or applications. There's a huge number of client libraries for Java and C and.NET and so on, where people can actually integrate
directly into their applications. There's also a REST proxy. So if you've got an application
that wants to pull or push data to and from Kafka and wants to do so over HTTP, you can use the REST proxy to do that as well.
And just for a point of clarification, just so our listeners understand. So you mentioned the
client libraries and those are Apache Kafka proper. The REST proxy though, that's Confluent
platform, is that correct? So the Java client libraries are part of Apache Kafka
and then Confluent, I've got a bunch of different client libraries built around LibRD Kafka
for C, C++,.NET, Python, and the REST proxy as well, part of Confluent Platform.
Fantastic. So Robin, you mentioned KSQL just a few minutes ago, which is a reasonably new, it's the new
kid in the Confluent platform.
Do you want to tell us a little bit about what that is and how people are using it,
how your customers are using it?
Yeah, definitely.
So KSQL, it's a SQL interface that enables you to build streaming applications on top of your data in Kafka.
So I suppose what it isn't, and it's kind of important to get this out of the way,
is that it is not a way of hooking up Tableau or whatever your analytics visualization tool
of choice is. It's not a way to hook that up to Kafka. I mean, you could do, and there is a community JDBC driver for it, but that's not what KSQL
is about.
So I'm saying that up front just because it's important to set expectations and understandings
about what KSQL is.
KSQL is for building stream processing applications.
It's so cool because if you think about the kind of ways in which people work with data, more often than not, they will use SQL to explore it.
They will be writing SQL statements to say, I'm going to take this lump of data in my data warehouse.
I'm going to filter it.
I'm going to look for this kind of condition.
And that's the interesting insights that you're pulling out of the data, you can take that SQL statement with those where's and those havings and the group by's and so on. And you can run that as a KSQL statement to not only act on all of your existing
data in Kafka, but also all of the data as it arrives. And when KSQL runs a SQL statement,
it's a continuous query. So unlike when you go and query Oracle or you go and query Postgres or
whatever, it's a
static query. You run the query and you get some data back. Well, you don't get data back, but you
get a result. And then you have to rerun it if you want to know if the data changed. With KSQL,
it's a continuous query because it's running against Kafka and Kafka is unbounded. It's an
infinite stream of data. So there may be no new messages arriving at the moment, but there may be some more coming
in five minutes, 10 minutes, a year, who knows, but it's unbounded. So ksql queries run continually.
And the output of a ksql query goes into a new Kafka topic. You can have it echo it to your
console instead if you want, but when you're actually building these stream processing
applications, it's writing the outputs to a Kafka topic.
And because it's a Kafka topic,
that means it can be consumed by pretty much anything because everything integrates with Kafka.
So you can use ksql to build out
very complex stream processing applications.
You can also use ksql to simply build out
building blocks of stream processes,
which filter a topic here,
join two topics over there,
aggregate this data here, and consume the results from that in your own applications,
in data stores downstream for analytics. But however you want to consume your data out of
Kafka, you can enrich and modify that data as it passes through Kafka using ksql.
Yeah, I think that was a huge benefit when you
start thinking about, A, that people are used to SQL as a way to take a look at data. But also,
Robin, to someone from our background, we're used to SQL being the language by which we do process data. And I think for Confluent to have acknowledged that and given
a layer that allows us to not only query, but also process using SQL is a big differentiator.
The other way that we would think about processing data within Apache Kafka is Kafka Streams. Do you want to talk a little bit about
the relationship between Kafka Streams and KSQL? Yeah, sure. So KSQL is built on top of Kafka
Streams. So Kafka Streams is an API within Apache Kafka. KSQL is part of Confluent Platform.
So KSQL will build out a Kafka Streams topology and execute using Kafka Streams.
Kafka Streams is, I suppose, like a lower level API, or rather KSQL is a higher level
abstraction on top of Kafka Streams.
If you're writing Java, if you want to do stream processing within your Java application,
you can bring in Kafka Streams as a library and do your filtering, your enrichment,
your transformations, your aggregations within your Java application and deploy it in exactly
the same way you deploy your Java applications. You don't need to have a new cluster specifically
for your stream processing and so on. You just write your Java applications as before.
Ricardo, anything you'd like to add to the KSQL discussion?
Yeah, actually, picking up what Robin was saying about the relationships and the differences between Kafka Streams and KSQL, there is another kind of architectural motivation for why KSQL
exists.
And it's a minor, but it's a very important one when you are thinking in doing stream
processing within your team and your development team.
If you think about it, Kafka Streams is a Java library. It's a Java or a Scala library
where developers bring up into their applications,
and when they're finished writing their applications,
that's going to become JVMs, right?
So runtime processes that it will bring data into memory,
and it will kind of doing the string processing on it. But
there is a problem with that. I mean, although it's cool to bring string processing within your
applications, but if you are doing some intense aggregation on it, you might end up with a very
bloated and a very large heap JVM, which is going to incur in a lot of memory problems such as
garbage collection and stop the work pauses. And that's going to incur in a lot of memory problems such as garbage collection and stop
the work pauses.
And that's going to be not very pleasant for the application itself.
So one of the architectural motivations for why KSQL exists, not only like Robin explained,
providing a DSL, a language that abstracts the whole string processing using Java, using
an SQL BASIC language, but the other one is to have their own dedicated cluster where you can run string processing
there in their own JVM separated from your applications.
And that way, you can kind of scale out your workloads in a string processing layer different
from your application layer.
So in the end of the day, KSQL is also a solution for a scalability problem
that might raise
when you're doing
string processing applications.
So there you have it.
Great.
So that's great to understand.
I know that when I first started
looking at Apache Kafka
some years ago,
the standard sort of architecture,
at least the use cases I saw,
were often with Apache Kafka
feeding Spark applications.
How does today's lineup of sort of solutions inside of both the Apache Kafka ecosystem
and the Confluent platform, how does that sit next to Spark clusters and Spark distributions?
When would you go sort of in one direction versus the other?
Either one of you guys want to jump in on that?
Yeah, I have some opinions about this design pattern using Spark and KSQL or Kafka Streams.
I mean, my main inclination for using Kafka Streams or KSQL is because they were built
on top of the consumer API, which is a battle-proving
technology that provides you the whole partitioning model, the scalability problem.
In the event of brokerage failures or the consumer group failures, you have the whole
rebalancing protocol taken in action.
So by having this framework layer built on top of something that's proving like the consumer API, I think we can
provide a very similar experience for the ksql developers to not worry about those details,
right? And what I see in other string processing framework, I'm not saying they are bad or wrong,
or bad or good. What I'm saying is that those building blocks, let's call it building blocks,
they are kind of become more relevant for the developers.
So they're exposed to these complexities and they somehow that need to solve it by themselves when you are dealing with some other framework.
Right.
Of course, some of the, I think, Spark streaming, it also is also based on some sort of a consumer API from Kafka. In a very underlying level, it also leveraged the same APIs.
But I'm pretty sure that some of those building blocks often come up when you are doing Spark streaming, micro-batching.
Because the semantics of doing processing is different.
So I think the main difference is how for the upper layer development, those
building blocks are abstracted. And that's one of the things that KSQL and Kafka Streams do very
well, which is abstract. The underlying complexity is about partitioning, scalability, rebalancing,
and failover. Excellent. Robin, did you have any feedback on that or any follow-up on that?
It comes up pretty much all the time when we talk about Kafka Streams and KSQL.
I suppose just on top of what Ricardo said, sometimes it's going to be a bit more mundane reason, which is just an existing technology is there.
And so if someone's already using Flink, they're already using Spark Streaming, I wouldn't particularly advocate go and rip it out and replace it because that's kind of fairly pointless unless there is a specific thing which it doesn't do that one of the others does.
So it's like with all technologies, it's always fun to kind of use something different.
There's pretty much feature parity on most things across most of these tools.
Some of the older ones are kind of less frequently updated nowadays
and a bit long in the tooth, so you may not opt for those.
I think it's when you're starting from a greenfield and you think,
well, my data is going to be in Kafka.
It's definitely going to be in Kafka.
It's an event-driven system, so we're using Kafka.
That's step number one.
Step number two is how much broader do I want my technology footprint to be? And you might think,
well, I'm going to use this other thing for this particular reason, and that's fine. But I think
my guiding rule on this always is like, well, I'd start off with what's in the box already.
So I've already got Kafka Streams. If I want to use SQL with it, I've got KSQL on top of it
and kind of broaden out from there.
Yeah, I think that makes sense.
It's like, do I really need another cluster?
I mean, that's what you're talking about.
Spark's not just another library that you add.
It's actually another cluster you add.
So I think the question that I always talk to customers about
is do you need another cluster?
Maybe you do.
And if you do, then let's build one.
But if you don't, let's keep the one cluster we have.
Is that about sound right?
So talking, so did you want to follow up on that, Ricardo?
No, just a quick comment about some of the motivations that we've seen at Confluent about
why some customers kind of choose Spark Streaming.
It's not necessarily a technical motivation, but sometimes it's more like, or they are
a development firm that are specialized in doing Spark streaming.
And they kind of have their own kind of a task force specialized on that technology.
So sometimes it's not just about choosing the technology because why it is best, but sometimes it's because it's that knowledge that they have for doing something processed. So they settle for where the data is, which is Kafka,
and they use Spark or Flink,
because that is the technology that they have been building,
they are processing for the last five years.
So sometimes it's some market trends that we've seen
that's not necessarily tied to technical aspects.
Yeah, and that's a great lead-in to, you know,
we were sort of at a high level talking about
trying to make developers' lives easier,
at least architecturally,
making it easier to run systems.
And I think that's a good lead-in to the Confluent Cloud.
And, you know, there's been some really great announcements.
It seems like you guys just keep hitting them with more announcements around Confluent Cloud and different options and
more availability and et cetera. Ricardo, do you want to maybe just give us a high level
of where Confluent sits with your cloud offerings and sort of give us a lay of the land
so that we can see the different ways
we might think about that.
Sure, sure.
I think that the best way to explain this
and Confluent Cloud is to discuss a little bit
what managed servers are.
If you think about, if you go back 10 years ago
when the whole cloud thing kind of started,
we were thinking in fundamental building blocks, which is basically infrastructure, like making sure infrastructure was so easy to consume that developers could focus on what they do best, which is writing code and PaaS platform as a services component, where not
only infrastructure was provided as a service, but also kind of a pre-built frameworks and
components that developers could simply spin up and use and shorten their development times.
So one great example that I would like to give about managed services is BigQuery from
GCP, for example. So if you want to work with TerraScale kind of a database
and you don't want to worry about how to set up,
how to install it, patch it, secure or manage,
you can simply go to the GCP console or CLI
and spin up your new BigQuery table.
And there you have it.
I mean, five minutes later, you have an up and running
big TerraScale database that you
can start using it and hooking up with your application.
So the whole, although some people kind of say that cloud computing is all about reducing
costs, my take on this is that cloud computing is also about making sure you become truly
agile.
So you build things faster and managed services are a very good indication where we are leading towards that direction.
So going back to your original question, Stuart, so what is Confluent Cloud?
It is a managed service.
So it's a way to offer our customers Poshy Kafka as something that they don't necessarily have to worry about how to install, how to patch it, how to provision, how to manage,
how to scale it. And the end result of this is that, hey, five minutes later, you don't worry
about the Kafka cluster anymore. You don't worry about schema registry anymore. You don't worry
about some of the services that we are introducing into Confluent Cloud such as ksql or Kafka Connect,
and you jump straight to what really matters and what really provides value to
the organization, which is building applications. Yeah, I agree. I tweeted the other day that
your newest offering, it's just give me an API and some SQL. As a developer, that's all I need.
Robin, what does that really mean for developers? And you guys are both in the line of work
where you're trying to ease the friction for developers.
What does it really mean for a developer to have these kind of options
to bring Kafka into their architecture?
So I suppose by making it all available as a managed service,
it's one less thing to have to worry about
and get set up before you can actually start being productive.
So if you've got your data flowing through Kafka,
which obviously persists the data for as long as you want it,
as a developer, you can now spin up a KSQL instance
and start writing your streaming queries against that data
and transforming it and enriching it and writing it somewhere else
without first having to worry about setting up a cluster,
managing that cluster, and so on.
Yeah, excellent.
And so you guys had a big announcement at Google Cloud Next this year.
Ricardo, do you want to talk a little bit about what that announcement was
and what it means for listeners of this podcast?
Sure, sure.
So the announcement pretty much was that GCP, Google Cloud,
were partnership with some very strategic companies
to bring a more clear and open source version of their key technologies.
So I'm going to mention two of them.
So the first one is going to be Radis.
So GCP is providing a first Radis
clusters as a first class system. And pretty much what they did was partner up with Radis
which is going to take care of the whole clustering and provisioning for them.
But what is more important from that is that for the GCP customer or user, they're going to be able
to spin up Radis clusters straight from their GCP console or CLI.
Same goes for Apache Kafka clusters.
So what GCP did was partner up with Confluent.
So pretty much all the clusters that developers will spin up from GCP or CLI will be actually
provisioned by Confluent and more importantly, managed by Confluent. So what that means for the users is a peace of mind that their experience within that
specific cloud provider is going to be all the same.
So the same simplicity that they spin up the query tables, they're going to spin up Apache
Kafka clusters.
So that is a value that the GCP as a cloud provider is bringing to the users, which is
pretty cool.
But more important than that is the relationship about how GCP is outsourcing their cloud experience to what we call the domain experts.
I mean, Confluent is known for having a pretty good and very large knowledge in terms of how to provision cluster in the cloud.
So it's kind of a smart move from Google to kind of rely on domain experts for doing that
instead of building their own cloud services by themselves, which it's not very scalable
because they're, again, they're not experts on Apache Kafka.
So that's pretty much what the announcements were.
Yeah, that's great.
I mean, Mark spends a lot of time on this podcast talking about offerings inside of gcp and also the listeners to this podcast are
regularly hearing about you know how to make their lives easier how the cloud can make their
lives easier and i think this last uh um announcement really does um speak to what it means to really make Kafka available and
Confluent in general available to a much wider audience, those organizations that may be smaller
or don't have infrastructure and don't have the expertise to run big systems.
Now it's really just a few clicks away. So I think we're going to wrap up here.
And so, Robin, you want to tell the listeners
how perhaps they might find out more
about Apache Kafka and the Confluent platform
and Confluent in general?
Yeah, so confluent.io is our website.
You can go and download it from there.
We've got a bunch of quick start tutorials. If you want to try it for yourself. You can go and download it from there. We've got a bunch of quick start tutorials.
If you want to try it for yourself,
you can go and download it.
We've got an examples repository on our GitHub.
You can go and try them there.
There's one called Demo Scene as well.
It's all on Docker,
so it's easy to just spin the whole thing up.
So there's some good places to get started.
And Ricardo, anything to add there? Our listeners may want to try some things out.
Yeah, I actually would like to recommend taking a look for Confluent Cloud. I mean, it's
very, very easy to start using Kafka through that route. I mean, if you go to Confluent Cloud and
create a URL account, I guarantee you that five minutes later, you will have your Kafka cluster running. So I
think it's important for developers that are trying to focus on the developing part and
go into Confluent Cloud, which is basically confluent.io slash cloud. you're going to end up there. So we do have lots of repositories that shows code pointing to Confluent Cloud.
So I think that's our point is to make developers life easier as we go with our jobs.
Robin, anything to add on that?
Yeah, just one more thing that I forgot to mention originally.
We've got a Slack group, the Confluent platform Slack group,
Confluent community. There's like 9,000 people on there. There's tons of people
from the community. There's Confluent people on there. So that's a great place to go.
If you've got questions about specifics of this, there's different channels for each different
part of Confluent platform. There's also a mailing list and there's also Stack Overflow
and places like that as well. Some good resources there.
That's fantastic.
So we'll make sure that we put the Docker links, the Confluent Cloud links, and also the Slack links in the show notes so that our listeners can get to that easily.
So Robin, Ricardo, really appreciate you guys taking some time today to join us on the Drill
to Detail
podcast. Thanks again. And for Mark Rittman, this is Stuart Bryson. Thanks for listening. Thank you.