Drill to Detail - Drill to Detail Ep.68 ‘Confluent, Event-First Thinking and Streaming Real-Time Analytics’ With Special Guests Robin Moffatt and Ricardo Ferreira and Special Host Stewart Bryson

Starting point is 00:00:00 Hello everyone, thanks for joining us for another episode of Drill to Detail. I'm your guest host, Stuart Bryson. Those of you that have listened to the show in the past may know me as a recurring guest. And Mark has taken the day off from this recording for a couple of reasons. He wanted me to step in and do a guest host and also the subject matter, which is Apache Kafka. And the Confluent platform is one that he knows I'm an avid fan of. So here you are. You've got a yank recording on Drill to Detail. So here we go.

Starting point is 00:00:58 We've got two great guests from Confluent who are joining us to discuss Apache Kafka and the Confluent platform. First up is Robin Moffitt. He's a developer advocate at Confluent. Robin, why don't you tell us a little bit about yourself and what you do there as a developer advocate for Confluent? Sure, thanks. Thanks for having us on the show. So yeah, I'm a developer advocate, which is, it's a really cool role. It's something I massively enjoy. As the name implies, it's advocating for developers, both kind of to them and explaining how technology can help them,

Starting point is 00:01:35 but then also advocating for them back internally. So taking feedback from developers, acting as a developer and working with our products and kind of feeding back to engineering and product and so on about certain directions or functionality within the software itself. So it's a lot of fun. So it's writing blogs, doing talks, working with developers in the community.

Starting point is 00:01:57 All the stuff we used to have to try to find time for instead of it being our main job, right, Robin? Yeah, and it's funny because it's the kind of thing which I never knew actually existed. I didn't realize that was kind of a job, a profession that you could do in and of itself until my now boss told me a couple of years ago at OpenWorld, hey, go and have a look at this conference track.

Starting point is 00:02:19 And I went and sat in on it. It was a whole bunch of talks all about developer relations. And it's like, oh, wow, this thing actually exists. So it's, yeah, it's really cool. Yeah, it's so important today in the time of the cloud because, you know, things are readily available, but you're not really sure what exists or what's the best route to getting to know them.

Starting point is 00:02:40 So Ricardo's been quiet as Robin and I've been bantering on. So Ricardo, why don't you step up, tell us a little bit about yourself and what you do there at Confluent as well. Sure. Thanks, Stuart. Well, first of all, thanks for having me as well. That's going to be my first joint in these episodes. I'm also developer advocate at Confluent and like Robin explained it very well, our job

Starting point is 00:03:04 is essentially to make sure developers know a bit more about what they can do with Kafka and Confluent platform, as well as to help them to bring their struggle and complaints to our engineering teams and make sure we always come up with a better technology. Before I joined Confluent, it happened very recently. I'm one of the youngest developer advocates on the team. And I was working at Oracle. I spent eight years there. And the funny story is that everybody asked me,

Starting point is 00:03:36 oh, you work at Oracle, so you must know a lot of databases and ETL and all that stuff. And I always tell them, yeah, but do you know what? I don't know anything about it because my background was more focused on midware and integration technology. So I just kind of happened to work at Oracle, which happens to work with some other technologies too,

Starting point is 00:03:57 although their baby is definitely the database. So Ricardo, what do you think that says about, you know, the three of us on this podcast today discussing Apache Kafka and the Confluent platform? We all sort of have our histories in the Oracle world. Is that just because Oracle was that prevalent in almost all spaces? Or is there something specific about that background? What do you think that leads us into some of these more modern technologies? Yeah, I would agree that it has to do with the fact

Starting point is 00:04:27 that Oracle was kind of a prevalence everywhere, especially if we go back to 20 years ago. And they definitely had a finger on how do we manage and process data. However, and like any technology, things evolve, right? So we are always looking for better ways to do things and more efficiently and more effective. And I think the future kind of has a different plan for what we used to do 20 years ago. And one of the things that we are seeking here is, especially at Confluent and everywhere,

Starting point is 00:05:00 is to manage data in a more stream processing way. So yeah, I think Oracle had something to do with our history here. What a great lead-in to the next question. So you both discussed the Confluent platform. And Robin, why don't you step up and, you know, there's Apache Kafka, there's the Confluent platform. Do you want to, you know, take our listeners through just a little bit of an overview of what those

Starting point is 00:05:27 are and what the differences are? Yeah, sure. So I guess pairing it right back to kind of what Kafka is, because quite a lot of people have heard of it and not everyone fully understands what it is. So Kafka at its very, very heart is this idea of a distributed commit log. And again, we can talk about logs and unbounded streams and stuff like that in a moment. But taking that as a given, it's this distributed system that enables you and acts as an event streaming platform. So it's got integration APIs.

Starting point is 00:05:59 It's got stream processing APIs. And it's a project from the Apache Software Foundation. And around that, there's Confluent Platform, which builds a bunch of pieces on top of it and gives you the tools and technologies that you need to actually build and deploy projects and systems around Kafka itself. So an example of that would be some of the connectors for enabling you to hook it up to databases, to Elasticsearch, to HDFS, to S3. It gives you a schema registry. So we can talk about that in more detail if we want to. But being able to actually give you a way to kind of store and govern your schemas and your pipelines.

Starting point is 00:06:40 It gives you ksql, which is super important. And I guess particularly to our audience here, super interesting given the SQL nature of it. And then you've got things like monitoring tools and development, web UIs. You've got a whole bunch of stuff in there which makes up Confluent Platform. Fantastic. Anything that you'd like to add on there, Ricardo? No, I think Robin's explanation was definitely both comprehensive and perfectly right. As usual. So moving on. So Robin, you discussed stream processing there for a second. So I remember when I was first looking at Apache Kafka,

Starting point is 00:07:21 and it was always the discussion of the distributed commit log, which still rings true, I think, and I wouldn't mind getting your opinion. But it's more than that now, isn't it? I mean, what even Apache Kafka has on top of the core Kafka and what you guys at Confluent put on top of it is all about this stream processing paradigm. Do you want to just talk to our listeners about, you know, what does that mean, stream processing? And if somebody hasn't experienced perhaps batch processing or real-time processing, what is really inherently different about stream processing?

Starting point is 00:08:00 So I think if it's all right, I'll answer a slightly different question and we can talk about stream processing but the kind of the the bit to understand first is around the events and this idea of an event room platform because is that all right i kind of yeah let's do that i'm feeding the questions and the answers but the the stream processing bit makes sense but i think one of the greatest not mistakes but things that kind of took me a bit of while to adjust to when I was learning about Kafka from this world, this kind of like back in 15 years of working with batch systems, is that Why would I care about it? And it's quite easy to dismiss it because of that. Whereas what Kafka gives us and why Kafka is so powerful is this concept of event-first thinking. And events are actually what power a great deal of the data that we work with.

Starting point is 00:09:00 So events enable us to model the real world. So in the same way that when we're building data warehouses, we aggregate data up and that kind of makes things nice and fast to perform with. But once you've aggregated it up, you can't go back from there. So you've got your weekly summaries, but you want to know your daily stuff. You have to have retained those base figures, otherwise you can't go back from it. And in the same way, events are our raw data. Events are

Starting point is 00:09:26 actually what happens. And from events, we can aggregate up. We can create states, we can determine what happened from those events. But unless we actually capture the events, we lose some of the fidelity of the data. So Kafka acts as this event streaming platform that lets us capture events and model events and do stream processing on events as well, which is why this answer kind of comes before the next bit when you ask about stream processing. Because Kafka is this, it's not only a distributed commit log, it's also an immutable commit log, which means you can't go back and change it. So something happens and then something else happens. You can't go back in time and change things. You might wish you could have done. You might wish that we kind of like started recording

Starting point is 00:10:13 and whatever, but sometimes things happen and then you have to kind of, you want to go and change this. You actually, you can't do that if something's immutable, but because it's immutable, that gives it great powers for reasoning about what you've got within it. So Kafka is this immutable event log, something happens, something else happens. So to give it a kind of an idea around that, if you think about an online website, you're kind of, you're placing an order on that website, the traditional point of capturing that data, certainly from an analytics point of view, and probably just from an application point of view, is what's in the basket? So one places the order, what's in the basket?

Starting point is 00:10:53 And we start to analyze the baskets and say, oh, people buy this, people buy that, people buy whatever. But what we lose in that are, well, how did they make that basket up? What went into the basket? And did they take things out and change it? And all those events around, they put some baked beans into the basket and then they put some bread into the basket and then they took the baked beans out and they put tin spaghetti in. All of those different things you lose if you don't capture the events.

Starting point is 00:11:21 And some people listening to this will say, oh, well, that's fine. We can also capture those things and we can write them somewhere else and store those in a database events. And some people listening to this will say, oh, well, that's fine. We can also capture those things and we can write them somewhere else and store those in a database because we'd want to analyze it. But that misses the point because then you're building in that bit specifically. Whereas if you're capturing the events, you get all of that for free. And then you can decide, well, we don't want to know the individual bits. We can just roll it up and you can roll it up, but you can never go backwards from that kind of the initial capture. Yeah, that's a great explanation, Robin.

Starting point is 00:11:50 And it does mirror my experience as well, trying to come up to speed with what you've described, which is the concept of what an event is, having spent so much time with relational databases and looking at the layout of the data as it's recorded or as it exists today and not necessarily the tiny bits of structure that caused it to build up to the point it is today. And I think that's what you're talking about. Ricardo, is that a similar experience? I don't believe you came from sort of the data warehousing background that Robin and

Starting point is 00:12:26 I did. What's your experience with events and what that means to Apache Kafka and the Confluent platform? Sure. Yeah. No, definitely. I hadn't came from any data warehouse or database background. But the way I like to look at string processing, and that's how I think most

Starting point is 00:12:46 people these days could agree with, is that about two aspects. The first one is, like any technology, I think we always kind of focus on what we can do in a given point of time in history. Like, for example, if we go back 20 or 30 years ago, we had this concept of database very well established and pretty much every developer, DBA or database architect were thinking in two ways to process data. First, we have to acquire and store, period, right? And then we could come up as developers with processing and applications that would query the data that is stored fundamentally and bring it to memory to start processing. So if you think about it, it's a two-step process. It takes time. It introduces bad latency, as Robin liked to explain on his presentation,

Starting point is 00:13:37 which is pretty good. And we've spent the last 30 years doing things like this pretty much because this is how the database technology works. I mean, they were meant to store data and process data later, period, right? But then something starts changing. And that something is the need for some companies and organizations these days to not only to store, acquire, and store data in a given point in time, but at the same time, not one day, not one week later, not one month later, but at the same time, not one day, not one week later, not one month later, but at the same time to start processing it as well. And the need for this is for giving proper near real time insight or to come up with some actionable insight that would change some outcome of the business.

Starting point is 00:14:20 And that is, I think, is the heart of string processing. So it's two key pieces. The first one is the evolution of the technology. So coming up with new type of databases, let's call streaming databases, that are able to store data and process data as the event is in motion to feed the use cases that people are kind of looking for more frequently these days, which is, take for instance, Uber. If you think about Uber, it's all about bringing static data,

Starting point is 00:14:50 the information about the passenger and the driver, as well as the data that is in motion, such as their position and GPS position, and blending them together in such a way that you can actually come up with, hey, so that means that the driver is two minutes away from me. So that's the type of motivation for string processing. And that's the way I like to see. And pretty much, I think everybody will agree with that. That's the future about how do we see and process data. That's great, Ricardo. And I think that leads in well to our next discussion, which is, I wanted to talk about when you start to think about a new platform or a new piece of software, it's difficult to just inject that into a current

Starting point is 00:15:34 environment. And I think that's where Kafka Connect comes in. I think one of the reasons that it was so easy for me personally to get up and running with Kafka is it was so easy to get data into it from a bunch of different systems. Robin, do you want to talk about what Kafka Connect is and why, in my mind, it's a big differentiator? And maybe you could tell us a little bit about what that means. Yeah, sure. Kafka Connect is one of my favorite bits because it brings in my previous experience with databases into my passion for Kafka, because it's part of Apache Kafka and it acts as an integration API, basically, streaming integration, both with systems upstream, where you want to pull data, you want to stream data into Kafka, and also for taking data from Kafka and pushing it out to other places.

Starting point is 00:16:29 So, for example, you've got a bunch of data sat in a database, in flat files, on message queues, and you want to get that into Kafka. Maybe you want to get it into Kafka because you then want to push it down somewhere else. So just building a pipeline, maybe doing some kind of database offload for analytics, but also for getting data into Kafka to then drive event-driven applications that want to respond to something happens somewhere else and we want to be able to respond to that. Or you want to do some processing on it and use something like KSQL to actually build stream processing applications against this data. So Kafka Connect's actually dead easy to use because it's just configuration files. You say, I've got data in this place over here, bring it into this topic, or you've got data in this topic here, push it out to that place. And there are hundreds of different connectors.

Starting point is 00:17:13 There's connectors from Confluent Platform, there's connectors from software partners like Oracle, there's also connectors from the community. So you find the connector for your particular technology, whether it's a database, whether it's Elasticsearch, whether it's Influx, whether it's whatever technology, and you simply plug that into Kafka Connect and set up the configuration file and off you go. So that was a great introduction to Kafka Connect, Robin. Are there other ways that, or other integration points or other ways that people might get either data into Kafka or out of Kafka? Yeah, sure. So Kafka Connect is definitely what people use where it's kind of like it's a given existing technology. So I want to plug it into a database. You use Kafka Connect.

Starting point is 00:17:56 You definitely don't want to sit there writing your own programs, pull data when there's kind of that wheel exists already. There's no need to go and reinvent it. But you also see where people have more bespoke systems or applications. There's a huge number of client libraries for Java and C and.NET and so on, where people can actually integrate directly into their applications. There's also a REST proxy. So if you've got an application that wants to pull or push data to and from Kafka and wants to do so over HTTP, you can use the REST proxy to do that as well. And just for a point of clarification, just so our listeners understand. So you mentioned the client libraries and those are Apache Kafka proper. The REST proxy though, that's Confluent platform, is that correct? So the Java client libraries are part of Apache Kafka

Starting point is 00:18:46 and then Confluent, I've got a bunch of different client libraries built around LibRD Kafka for C, C++,.NET, Python, and the REST proxy as well, part of Confluent Platform. Fantastic. So Robin, you mentioned KSQL just a few minutes ago, which is a reasonably new, it's the new kid in the Confluent platform. Do you want to tell us a little bit about what that is and how people are using it, how your customers are using it? Yeah, definitely. So KSQL, it's a SQL interface that enables you to build streaming applications on top of your data in Kafka.

Starting point is 00:19:28 So I suppose what it isn't, and it's kind of important to get this out of the way, is that it is not a way of hooking up Tableau or whatever your analytics visualization tool of choice is. It's not a way to hook that up to Kafka. I mean, you could do, and there is a community JDBC driver for it, but that's not what KSQL is about. So I'm saying that up front just because it's important to set expectations and understandings about what KSQL is. KSQL is for building stream processing applications. It's so cool because if you think about the kind of ways in which people work with data, more often than not, they will use SQL to explore it.

Starting point is 00:20:09 They will be writing SQL statements to say, I'm going to take this lump of data in my data warehouse. I'm going to filter it. I'm going to look for this kind of condition. And that's the interesting insights that you're pulling out of the data, you can take that SQL statement with those where's and those havings and the group by's and so on. And you can run that as a KSQL statement to not only act on all of your existing data in Kafka, but also all of the data as it arrives. And when KSQL runs a SQL statement, it's a continuous query. So unlike when you go and query Oracle or you go and query Postgres or whatever, it's a static query. You run the query and you get some data back. Well, you don't get data back, but you

Starting point is 00:20:48 get a result. And then you have to rerun it if you want to know if the data changed. With KSQL, it's a continuous query because it's running against Kafka and Kafka is unbounded. It's an infinite stream of data. So there may be no new messages arriving at the moment, but there may be some more coming in five minutes, 10 minutes, a year, who knows, but it's unbounded. So ksql queries run continually. And the output of a ksql query goes into a new Kafka topic. You can have it echo it to your console instead if you want, but when you're actually building these stream processing applications, it's writing the outputs to a Kafka topic. And because it's a Kafka topic,

Starting point is 00:21:30 that means it can be consumed by pretty much anything because everything integrates with Kafka. So you can use ksql to build out very complex stream processing applications. You can also use ksql to simply build out building blocks of stream processes, which filter a topic here, join two topics over there, aggregate this data here, and consume the results from that in your own applications,

Starting point is 00:21:52 in data stores downstream for analytics. But however you want to consume your data out of Kafka, you can enrich and modify that data as it passes through Kafka using ksql. Yeah, I think that was a huge benefit when you start thinking about, A, that people are used to SQL as a way to take a look at data. But also, Robin, to someone from our background, we're used to SQL being the language by which we do process data. And I think for Confluent to have acknowledged that and given a layer that allows us to not only query, but also process using SQL is a big differentiator. The other way that we would think about processing data within Apache Kafka is Kafka Streams. Do you want to talk a little bit about the relationship between Kafka Streams and KSQL? Yeah, sure. So KSQL is built on top of Kafka

Starting point is 00:22:52 Streams. So Kafka Streams is an API within Apache Kafka. KSQL is part of Confluent Platform. So KSQL will build out a Kafka Streams topology and execute using Kafka Streams. Kafka Streams is, I suppose, like a lower level API, or rather KSQL is a higher level abstraction on top of Kafka Streams. If you're writing Java, if you want to do stream processing within your Java application, you can bring in Kafka Streams as a library and do your filtering, your enrichment, your transformations, your aggregations within your Java application and deploy it in exactly the same way you deploy your Java applications. You don't need to have a new cluster specifically

Starting point is 00:23:34 for your stream processing and so on. You just write your Java applications as before. Ricardo, anything you'd like to add to the KSQL discussion? Yeah, actually, picking up what Robin was saying about the relationships and the differences between Kafka Streams and KSQL, there is another kind of architectural motivation for why KSQL exists. And it's a minor, but it's a very important one when you are thinking in doing stream processing within your team and your development team. If you think about it, Kafka Streams is a Java library. It's a Java or a Scala library where developers bring up into their applications,

Starting point is 00:24:12 and when they're finished writing their applications, that's going to become JVMs, right? So runtime processes that it will bring data into memory, and it will kind of doing the string processing on it. But there is a problem with that. I mean, although it's cool to bring string processing within your applications, but if you are doing some intense aggregation on it, you might end up with a very bloated and a very large heap JVM, which is going to incur in a lot of memory problems such as garbage collection and stop the work pauses. And that's going to incur in a lot of memory problems such as garbage collection and stop

Starting point is 00:24:45 the work pauses. And that's going to be not very pleasant for the application itself. So one of the architectural motivations for why KSQL exists, not only like Robin explained, providing a DSL, a language that abstracts the whole string processing using Java, using an SQL BASIC language, but the other one is to have their own dedicated cluster where you can run string processing there in their own JVM separated from your applications. And that way, you can kind of scale out your workloads in a string processing layer different from your application layer.

Starting point is 00:25:21 So in the end of the day, KSQL is also a solution for a scalability problem that might raise when you're doing string processing applications. So there you have it. Great. So that's great to understand. I know that when I first started

Starting point is 00:25:34 looking at Apache Kafka some years ago, the standard sort of architecture, at least the use cases I saw, were often with Apache Kafka feeding Spark applications. How does today's lineup of sort of solutions inside of both the Apache Kafka ecosystem and the Confluent platform, how does that sit next to Spark clusters and Spark distributions?

Starting point is 00:26:01 When would you go sort of in one direction versus the other? Either one of you guys want to jump in on that? Yeah, I have some opinions about this design pattern using Spark and KSQL or Kafka Streams. I mean, my main inclination for using Kafka Streams or KSQL is because they were built on top of the consumer API, which is a battle-proving technology that provides you the whole partitioning model, the scalability problem. In the event of brokerage failures or the consumer group failures, you have the whole rebalancing protocol taken in action.

Starting point is 00:26:38 So by having this framework layer built on top of something that's proving like the consumer API, I think we can provide a very similar experience for the ksql developers to not worry about those details, right? And what I see in other string processing framework, I'm not saying they are bad or wrong, or bad or good. What I'm saying is that those building blocks, let's call it building blocks, they are kind of become more relevant for the developers. So they're exposed to these complexities and they somehow that need to solve it by themselves when you are dealing with some other framework. Right. Of course, some of the, I think, Spark streaming, it also is also based on some sort of a consumer API from Kafka. In a very underlying level, it also leveraged the same APIs.

Starting point is 00:27:29 But I'm pretty sure that some of those building blocks often come up when you are doing Spark streaming, micro-batching. Because the semantics of doing processing is different. So I think the main difference is how for the upper layer development, those building blocks are abstracted. And that's one of the things that KSQL and Kafka Streams do very well, which is abstract. The underlying complexity is about partitioning, scalability, rebalancing, and failover. Excellent. Robin, did you have any feedback on that or any follow-up on that? It comes up pretty much all the time when we talk about Kafka Streams and KSQL. I suppose just on top of what Ricardo said, sometimes it's going to be a bit more mundane reason, which is just an existing technology is there.

Starting point is 00:28:19 And so if someone's already using Flink, they're already using Spark Streaming, I wouldn't particularly advocate go and rip it out and replace it because that's kind of fairly pointless unless there is a specific thing which it doesn't do that one of the others does. So it's like with all technologies, it's always fun to kind of use something different. There's pretty much feature parity on most things across most of these tools. Some of the older ones are kind of less frequently updated nowadays and a bit long in the tooth, so you may not opt for those. I think it's when you're starting from a greenfield and you think, well, my data is going to be in Kafka. It's definitely going to be in Kafka.

Starting point is 00:29:00 It's an event-driven system, so we're using Kafka. That's step number one. Step number two is how much broader do I want my technology footprint to be? And you might think, well, I'm going to use this other thing for this particular reason, and that's fine. But I think my guiding rule on this always is like, well, I'd start off with what's in the box already. So I've already got Kafka Streams. If I want to use SQL with it, I've got KSQL on top of it and kind of broaden out from there. Yeah, I think that makes sense.

Starting point is 00:29:26 It's like, do I really need another cluster? I mean, that's what you're talking about. Spark's not just another library that you add. It's actually another cluster you add. So I think the question that I always talk to customers about is do you need another cluster? Maybe you do. And if you do, then let's build one.

Starting point is 00:29:44 But if you don't, let's keep the one cluster we have. Is that about sound right? So talking, so did you want to follow up on that, Ricardo? No, just a quick comment about some of the motivations that we've seen at Confluent about why some customers kind of choose Spark Streaming. It's not necessarily a technical motivation, but sometimes it's more like, or they are a development firm that are specialized in doing Spark streaming. And they kind of have their own kind of a task force specialized on that technology.

Starting point is 00:30:15 So sometimes it's not just about choosing the technology because why it is best, but sometimes it's because it's that knowledge that they have for doing something processed. So they settle for where the data is, which is Kafka, and they use Spark or Flink, because that is the technology that they have been building, they are processing for the last five years. So sometimes it's some market trends that we've seen that's not necessarily tied to technical aspects. Yeah, and that's a great lead-in to, you know, we were sort of at a high level talking about

Starting point is 00:30:49 trying to make developers' lives easier, at least architecturally, making it easier to run systems. And I think that's a good lead-in to the Confluent Cloud. And, you know, there's been some really great announcements. It seems like you guys just keep hitting them with more announcements around Confluent Cloud and different options and more availability and et cetera. Ricardo, do you want to maybe just give us a high level of where Confluent sits with your cloud offerings and sort of give us a lay of the land

Starting point is 00:31:27 so that we can see the different ways we might think about that. Sure, sure. I think that the best way to explain this and Confluent Cloud is to discuss a little bit what managed servers are. If you think about, if you go back 10 years ago when the whole cloud thing kind of started,

Starting point is 00:31:44 we were thinking in fundamental building blocks, which is basically infrastructure, like making sure infrastructure was so easy to consume that developers could focus on what they do best, which is writing code and PaaS platform as a services component, where not only infrastructure was provided as a service, but also kind of a pre-built frameworks and components that developers could simply spin up and use and shorten their development times. So one great example that I would like to give about managed services is BigQuery from GCP, for example. So if you want to work with TerraScale kind of a database and you don't want to worry about how to set up, how to install it, patch it, secure or manage, you can simply go to the GCP console or CLI

Starting point is 00:32:37 and spin up your new BigQuery table. And there you have it. I mean, five minutes later, you have an up and running big TerraScale database that you can start using it and hooking up with your application. So the whole, although some people kind of say that cloud computing is all about reducing costs, my take on this is that cloud computing is also about making sure you become truly agile.

Starting point is 00:32:59 So you build things faster and managed services are a very good indication where we are leading towards that direction. So going back to your original question, Stuart, so what is Confluent Cloud? It is a managed service. So it's a way to offer our customers Poshy Kafka as something that they don't necessarily have to worry about how to install, how to patch it, how to provision, how to manage, how to scale it. And the end result of this is that, hey, five minutes later, you don't worry about the Kafka cluster anymore. You don't worry about schema registry anymore. You don't worry about some of the services that we are introducing into Confluent Cloud such as ksql or Kafka Connect, and you jump straight to what really matters and what really provides value to

Starting point is 00:33:45 the organization, which is building applications. Yeah, I agree. I tweeted the other day that your newest offering, it's just give me an API and some SQL. As a developer, that's all I need. Robin, what does that really mean for developers? And you guys are both in the line of work where you're trying to ease the friction for developers. What does it really mean for a developer to have these kind of options to bring Kafka into their architecture? So I suppose by making it all available as a managed service, it's one less thing to have to worry about

Starting point is 00:34:25 and get set up before you can actually start being productive. So if you've got your data flowing through Kafka, which obviously persists the data for as long as you want it, as a developer, you can now spin up a KSQL instance and start writing your streaming queries against that data and transforming it and enriching it and writing it somewhere else without first having to worry about setting up a cluster, managing that cluster, and so on.

Starting point is 00:34:51 Yeah, excellent. And so you guys had a big announcement at Google Cloud Next this year. Ricardo, do you want to talk a little bit about what that announcement was and what it means for listeners of this podcast? Sure, sure. So the announcement pretty much was that GCP, Google Cloud, were partnership with some very strategic companies to bring a more clear and open source version of their key technologies.

Starting point is 00:35:19 So I'm going to mention two of them. So the first one is going to be Radis. So GCP is providing a first Radis clusters as a first class system. And pretty much what they did was partner up with Radis which is going to take care of the whole clustering and provisioning for them. But what is more important from that is that for the GCP customer or user, they're going to be able to spin up Radis clusters straight from their GCP console or CLI. Same goes for Apache Kafka clusters.

Starting point is 00:35:49 So what GCP did was partner up with Confluent. So pretty much all the clusters that developers will spin up from GCP or CLI will be actually provisioned by Confluent and more importantly, managed by Confluent. So what that means for the users is a peace of mind that their experience within that specific cloud provider is going to be all the same. So the same simplicity that they spin up the query tables, they're going to spin up Apache Kafka clusters. So that is a value that the GCP as a cloud provider is bringing to the users, which is pretty cool.

Starting point is 00:36:25 But more important than that is the relationship about how GCP is outsourcing their cloud experience to what we call the domain experts. I mean, Confluent is known for having a pretty good and very large knowledge in terms of how to provision cluster in the cloud. So it's kind of a smart move from Google to kind of rely on domain experts for doing that instead of building their own cloud services by themselves, which it's not very scalable because they're, again, they're not experts on Apache Kafka. So that's pretty much what the announcements were. Yeah, that's great. I mean, Mark spends a lot of time on this podcast talking about offerings inside of gcp and also the listeners to this podcast are

Starting point is 00:37:11 regularly hearing about you know how to make their lives easier how the cloud can make their lives easier and i think this last uh um announcement really does um speak to what it means to really make Kafka available and Confluent in general available to a much wider audience, those organizations that may be smaller or don't have infrastructure and don't have the expertise to run big systems. Now it's really just a few clicks away. So I think we're going to wrap up here. And so, Robin, you want to tell the listeners how perhaps they might find out more about Apache Kafka and the Confluent platform

Starting point is 00:37:56 and Confluent in general? Yeah, so confluent.io is our website. You can go and download it from there. We've got a bunch of quick start tutorials. If you want to try it for yourself. You can go and download it from there. We've got a bunch of quick start tutorials. If you want to try it for yourself, you can go and download it. We've got an examples repository on our GitHub. You can go and try them there.

Starting point is 00:38:14 There's one called Demo Scene as well. It's all on Docker, so it's easy to just spin the whole thing up. So there's some good places to get started. And Ricardo, anything to add there? Our listeners may want to try some things out. Yeah, I actually would like to recommend taking a look for Confluent Cloud. I mean, it's very, very easy to start using Kafka through that route. I mean, if you go to Confluent Cloud and create a URL account, I guarantee you that five minutes later, you will have your Kafka cluster running. So I

Starting point is 00:38:49 think it's important for developers that are trying to focus on the developing part and go into Confluent Cloud, which is basically confluent.io slash cloud. you're going to end up there. So we do have lots of repositories that shows code pointing to Confluent Cloud. So I think that's our point is to make developers life easier as we go with our jobs. Robin, anything to add on that? Yeah, just one more thing that I forgot to mention originally. We've got a Slack group, the Confluent platform Slack group, Confluent community. There's like 9,000 people on there. There's tons of people from the community. There's Confluent people on there. So that's a great place to go.

Starting point is 00:39:33 If you've got questions about specifics of this, there's different channels for each different part of Confluent platform. There's also a mailing list and there's also Stack Overflow and places like that as well. Some good resources there. That's fantastic. So we'll make sure that we put the Docker links, the Confluent Cloud links, and also the Slack links in the show notes so that our listeners can get to that easily. So Robin, Ricardo, really appreciate you guys taking some time today to join us on the Drill to Detail podcast. Thanks again. And for Mark Rittman, this is Stuart Bryson. Thanks for listening. Thank you.

CODACE Plant Stand

Drill to Detail - Drill to Detail Ep.68 ‘Confluent, Event-First Thinking and Streaming Real-Time Analytics’ With Special Guests Robin Moffatt and Ricardo Ferreira and Special Host Stewart Bryson

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

CODACE Plant Stand

Drill to Detail - Drill to Detail Ep.68 ‘Confluent, Event-First Thinking and Streaming Real-Time Analytics’ With Special Guests Robin Moffatt and Ricardo Ferreira and Special Host Stewart Bryson

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.