Orchestrate all the Things - Cloud modernization and real-time data is how you cut down costs during downturns according to Striim. Featuring Alok Pareek, Striim Co-founder and EVP of products

Starting point is 00:00:00 Welcome to the Orchestrate All the Things podcast. I'm George Amadiotis and we'll be connecting the dots together. Can technology, and real-time technology in particular, help companies achieve savings during economic hardship? Alok Parikh thinks it can. Parikh is the co-founder and EVP of Products of Stream, a vendor whose goal and motto is to help companies make data useful the instant it's born.

Starting point is 00:00:25 Depending on which angle you look at it, you could say that Parik is either biased or in the know. Either way, it wasn't so long ago that real-time data, or streaming data as this market is also called, was estimated to be worth billions. But then again, as the recent wave of layoffs and market capitalization losses goes to show, many projections around technology are off the mark. Could real-time data be different? Where does cloud modernization come into play and how does Stream's offering relate to that? As Stream today announced the availability of its fully managed Stream cloud service on Amazon Web Services, we connected with Parik to discuss.

Starting point is 00:01:03 I hope you will enjoy the podcast. If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn, and Facebook. I'm one of the founders in Stream, and I run all of the product areas, including engineering and product management and strategy. And by way of background, I used to be the chief technology officer of a company called Golden Gate Software, which, are you familiar with Golden Gate, George? background. I came across some text that mentioned that, you know, said a few words about the founders and said that it was founded by people who had done you know, such and such and such and also including having founded and sold WebLogic to the BEA. And I got a bit, you know, a little bit touched by that because WebLogic was a technology I used to work very

Starting point is 00:02:03 extensively with and I have good experience of doing so. So I looked it up a bit further and it turned out that it wasn't you that did that actually, you have a different kind of background, but still it was an interesting touch for me. Yeah, my co-founder, it was interesting, my co-founder Ali Kotei, he was the CEO of WebLogic and then through the BEA acquisition. And then ultimately, Oracle ended up acquiring that. And Oracle also ended up acquiring GoldenGate. So it was interesting.

Starting point is 00:02:34 So most of Oracle's middleware, at least these are two pretty significant products in their product portfolio with easily over billions of dollars in revenue on an annual basis, actually, if you're taking a look at maintenance revenue now. In terms of background, I spent a number of years at Oracle, primarily the database development team, working on features all the way from Oracle. Gosh, when I joined, it was 6036. By the time I left, it was Oracle 11g. So a lot of my focus was on recovery and high-speed data movement in the Oracle database

Starting point is 00:03:12 kernel itself. And several years post-acquisition, running Oracle's data integration product portfolio that included the data integration product, the replication product, as well as data quality products that we had acquired at the workroom. Okay, so that's kind of my background. And let me just quickly get started. Like I said, I might be referring to some futures. So just very briefly, with our focus at Stream, and we've been doing this thing for close

Starting point is 00:03:44 to 10 years now. So we want to power all operations and decisions in real time. Real time is our focus. And we really want to make sure that companies and enterprises can connect pretty much with their partners, with their suppliers, with their customers,

Starting point is 00:04:00 via data flows for the digital economy. And we aim to make sure that the data is flowing in real time. And so with that, our product vision is to deliver a unified data integration and streaming service to the market. And by unified, we mean it's a single platform that brings together both the real-time data integration tenets, as well as all of the streaming analytics capabilities. So we do have a full-blown query processing layer that allows you to operate on the data in flight.

Starting point is 00:04:39 And why do we want to do that? It's to get data from, it's to get from data to decisions in real time. If you took a look at a data management landscape today, you'd probably end up putting together three, four, or five different products to address what Stream does. And so our core backbone allows this to be done in an easy to easy ease of use fashion easily manageable fashion so you don't have to actually struggle with you know deploying different technologies and bringing it all together yourself um just in terms of overall um you know we've been very successful

Starting point is 00:05:19 and um you know in multiple different uh areas um but be it logistics or travel, customer loyalty in retail, supply chain in retail. And to the point, and it's great that you have a background with these such large companies in the Fortune 500s and the like. So we really are running the state of the art digital services for, in multiple mission-critical environments. And I'll talk about that a little bit more. And then the team comes in, like you

Starting point is 00:05:51 are aware, from WebLogic, GoldenGate, Oracle. And our most recent round was our Series C round that was led by Goldman Sachs. And we also have several other prominent investors, as you can see. And we have pretty strategic partnerships with Microsoft and Google, where there are actual agreements with these guys for their customers on several data modernization projects. And then we also have partnerships with AWS Databicks, and Snowflake. Just an eye chart of our customers. Obviously, like I mentioned, this is a horizontal technology. It cuts through financial services, healthcare, transportation, logistics, retail, high-tech, telco.

Starting point is 00:06:39 So pretty much a wide range of use cases. And with the use cases, we can think of them in two different ways. One is where you have the data architects and the ETL developers and data engineers who are trying to build the foundation, the infrastructure layer, to make sure that data can be easily made available for consumption across your ecosystem in real time. And the real-time part is important because that differentiates us from a lot of the batch oriented technologies such as Informatica and Talent and all of the products in that

Starting point is 00:07:19 space. So with that in mind, the key use cases here tend to be cloud data integration, data modernization, trying to put together a customer 360, you know, potentially in like an analytic system. Many folks are looking for just a real time change in a capture technology, trying to do streaming analytics, streaming ETL, real-time analytics. And this is sort of something that, as we look across our customer base, this is how they are using us. So it's not something that we are going out and saying, hey, please use us this way. But because it's a platform

Starting point is 00:07:57 and it has a breadth of functional areas within the product, these are the common use cases for Stream. So this is more of a technical-oriented use case. The second one is more of a business user, application user-oriented use case, where application developers or BI analysts might actually come in and they are trying to deliver, you know, some value to the business, like, you know, improve customer experience, you know, better patient care, trying to make sure that across your multiple fraud detection systems, there's still no gaps and trying to make sure that, you know, you can, you know, look at a cross spectrum of different products as data is coming around in real time to add further

Starting point is 00:08:41 logic there. And the list goes on in terms of fleet cargo and parts management. We have one of the largest airlines in the world as our managed services customer on the cargo and the part side. Real-time dynamic pricing, marketing promotions, omni-channel.

Starting point is 00:08:59 So I'll get into maybe one or two specific use cases later on, but just wanted to make sure that these are sort of the core use cases of how we go to market and then you know what we're capitalizing on are these six architectural shifts uh this is something that came from mckenzie digital um you know a year and a half ago um they're very well aligned with uh what we see in the market, trying to make sure that you have a shift from on-premise to cloud-based data platforms, from batch to real-time data processing, from pre-integrated commercial solutions to best-of-breed platforms. We see many of the teams trying to actually look at architectures like data mesh or data fabric, where they're trying to get from point to point to decouple data access. Also, you know, from the 2015, 2016,

Starting point is 00:09:52 2017 era, where people were sort of, you know, busy building, you know, a Hadoop-based data lake, trying to actually move away from that a little bit towards more of a domain-based architecture, where there are purpose-built applications, which be called data products that are being housed by the domain folks who best understand the domain and the data um and then you try to go from rigid data models to more open formats like json and parquet and delta and so forth so we find ourselves in the middle of these architectural shifts and we are the enablers here, you know, as we help our customers. So a conceptual, simple way to look at this is,

Starting point is 00:10:33 you know, a majority of the businesses today might have, you know, their applications running, you know, using a number of different areas in the stack on premise, or maybe, you know, more recently in the cloud. And we help move the data across these applications and databases and Kafka queues, or even data coming over the web

Starting point is 00:10:57 to these different endpoints, cross cloud, multi-cloud in real time. And that's really how we help enable a key piece of the digital transformation. So I'll maybe stop here just for a few seconds just to make sure at least at the high level, the picture is sort of clear as to where we fit in, George. Yeah, it is pretty clear.

Starting point is 00:11:21 And actually, to be honest with you, that aligns with the understanding I had already kind of looking up what Stream does. And even by name, it was pretty clear that you're in the real-time streaming data space. What I was more interested in, to be honest with you, was, well, what precisely your differentiation in that space is, let's say, compared to all the other options? Because as you obviously know, there's a number of other options in that same domain as well.

Starting point is 00:11:51 So let's try and zero in on that one. So I would start by asking you, so do you actually cover both data transfer and data transformation? So can people do data transformation using Stream? Yeah, great question. And I'm just going to land right on the platform itself. And this will actually show you the various capabilities, as well as I'll point out some of the unique things that we have done, which sort of take us away from many of the traditional

Starting point is 00:12:20 players in this market. So let's actually start with the data sources. So as you take a look at it, we do tend to support hundreds of different data sources. And these could be databases, log files, messaging systems like Kafka or IBM MQ or JMS-based systems. Data coming in from sensors, data coming in from just over the web. So in a straightforward, you know, data integration type of a scenario, you know, what we do is we do continuous data collection as opposed to sort of batch data collection.

Starting point is 00:12:55 And that's a key piece. I'll just get into that in a second. And this technology is called CDC. So there are a few players in this market. Clearly, you know, there's Oracle GoldenGate, which was done by us, and then now housed at Oracle. IBM has part of the Infosphere family, where they had acquired a Canadian company called Data Mirror, which addresses that space. And there's a few smaller players that we had seen a while ago, which constitute the

Starting point is 00:13:22 remainders. One is now owned by Click. One is now owned by Fivetran. So we're sort of an independent, you know, standalone company that actually provides this. In the simpler use case, we simply deliver, you know, these change streams onto a number of different target systems. So, for example, a MongoDB to Azure Event Hubs, or a MongoDB to Snowflake, or an Oracle to Google BigQuery, these might be simple, straight through real-time data integrations where you're just trying to make sure that what happens as the data is moving? And here's where the capability of the platform can be leveraged to do data filtering, to do data aggregation, transformation, and enrichment. technology out there today that can introduce streaming. So for example, let's take Confluent or let's take Apache Flink or any kind of a streaming system like Spark streaming. Usually they do not have the capability, the CDC capability. So they would still end up using

Starting point is 00:14:35 stream to move data from, and many of you have many customers doing this. If you have, let's say, data coming in from a CRM system or a Salesforce application, getting that into just Kafka, you would actually use Stream. And then beyond that, you might use Confluent, which is actually supporting the Kafka platform or the Kafka cluster itself. So we actually bring, now, once we bring the data in, it's your choice because we do offer all of the SQL capabilities for doing continuous query processing, right? So this is where, you know, you don't have to now put together your own, you know, streaming layer,

Starting point is 00:15:10 which has to do with the processing part of it. And this is a key separation. I find in the industry, the most common confusion is a lot of people think that streaming ingestion, right? If you do that with a backbone, like, you know, like Kafka or Apache Pulsar, then you are done with streaming. But there's also the stream processing part of it. And there's also

Starting point is 00:15:30 the in and out pieces of it. Like, why are you actually doing it? For whose purpose? Where are you getting the data from? Where are you delivering the data to? So the unique thing that we are doing is we have taken the endpoints and we have fused that together to make sure that all of the streaming capability that you need from, you know, ingestion, particularly from change data capture set of data sources, which otherwise are very difficult to consume data from on a continuous basis. You cannot just keep pulling database tables all the time because, you know, operational concerns. So from streaming ingestion and CDC to stream processing, to stream storage, to stream analytics,

Starting point is 00:16:09 and then finally to actually stream visualization and delivery, we are taking all of these different capabilities in the streaming system. And we have a comprehensive platform that addresses all of these. And that's sort of like what's different. That's why we call it unified. And I'll give you an example of enrichment. Let's say you wanted to actually join reference data where, you know, as data is coming through,

Starting point is 00:16:29 you know, from, you know, in a retail environment, for example, you're coming in from a digital channel. Let's say you're coming to a website and then you place an order and the order management system is another application that could be running on IBM mainframe. Getting these two events together to, interestingly either spot patterns or trying to alert somebody because this is a high profile customer, you quickly want to go ahead and cross correlate that in real time.

Starting point is 00:16:57 And that's where this enrichment piece comes in. You have reference data that you can preload and you can actually combine that. And as you go through, right, you are able to do continuous queries and windowing. And the continuous queries have to do with the fact that I can look at data as it's coming in every minute or every 15 minutes, every five minutes

Starting point is 00:17:16 and have a set of queries that are specified. So this is a push-based notification to me as opposed to me polling, right? And I wanna be careful about these two use cases. One of them is I move data in real time to, let's say, Google BigQuery, and I run a set of applications using queries on Google. The second is, as I'm moving it, the data, using stream to Google BigQuery, I can define data windows of one minute, and I can see, hey, just give me sales from a specific store,

Starting point is 00:17:46 if I have hundreds of stores all over North America. And then if the sales are underwhelming, then go alert somebody in real time before it even makes it to Google BigQuery. So that's a very different paradigm and we are bringing that into the integration landscape here. And then the more advanced use cases tend to be where you could actually do multi-stream correlation. You could do pattern matching. So windows are obviously either time-based or they are batch-based. So you could say that maybe every hour do this for me or every 100 events do this for me. You could also define pattern-based windows where you say, hey, my window starts and I have a specific punctuation,

Starting point is 00:18:26 where I have a specific pattern starting with a keyword, and then it also ends with a pattern. And then that becomes your window because you want to analyze things there. And that's super useful for doing session management and so forth in real time. So we've added that piece into the integration pipeline. And one way to consume this is through very simple use cases, like I mentioned, real-time alerting, real-time triggers, expressing real-time ad hoc queries, building real-time dashboards, and then incorporating your own machine learning and AI. Now, that's a lot, but I want to be careful that the way we are coming to market is we're saying for the real-time pieces of the integration and analytics,

Starting point is 00:19:11 Stream is the platform. For your long-term batch-oriented analytics, where scale is needed, scale as in I may want to actually go ahead and run a query against one year's worth of data, that's where now you're moving outside of Stream into systems like Snowflake and BigQuery and so forth. Does that make sense? Yes, there's something else that I wanted to ask you specifically about. So you did mention that the platform does offer a SQL interface as well. At the same time, you also mentioned different options that people have for defining pattern-based matching or time-based matching and so on. Are those two combined at all in any way? Or if not, do you have specific SQL extensions that people can use for that?

Starting point is 00:20:03 And how can people find the exact patterns that they want to match? Yeah, fantastic. Great question, George, by the way. So we do have a variant of SQL and then we have introduced our own extensions to it. So for example, to the specific question you asked about how do you do pattern matching?

Starting point is 00:20:23 So we actually have a function, right, which is really a pattern matching operator. And then, you know, you can actually invoke that within as a UDF in SQL, and then you actually pass in a regular expression based on the event. And that looks for that specific regular expression within the event itself. Okay, that's how you actually go ahead and express that logic if there's something that's difficult to express in sql then um you know you can actually link in with the you can extend the platform you can link in with your own uh this is a primarily a distributed java-based platform so you can actually link in with your own code as well so we have an interface we have a component called open processor and then where we actually publish

Starting point is 00:21:05 a set of interfaces for initialization and then runtime and so forth. And then you can actually get the result back into the platform. And that's how you could actually write your own ML, for example, or if you can write your own data cleansing rules on the data, for example, or you could actually do your own auditing with third-party systems within that layer, for example. I'm citing real examples of what our customers end up doing with that open processor. Okay, so the Platon does offer an API, I guess, that people can use.

Starting point is 00:21:37 However, if I got it right, it's not open source, right? So it's proprietary and it comes into flavors, basically, so people can use it on premise or they can use the self-managed cloud version or they can use the fully managed cloud platform that you offer, which is actually the occasion for having today's conversation. So let's focus a little bit on that. Sure. Great, great question.

Starting point is 00:22:02 I love the flow. It's just pretty much in line with what I thought we would want to talk about. So the product itself is offered either as a platform where you can download it and you can run it on your data center, or you can host it yourself. So we are available in the marketplace where we run in our customers' account. And finally, in our managed services, that's where we run it in our account. And then you basically don't have to worry about any of the manageability, monitoring, batching, upgrading, all of the stuff that you leave it up to us, and you're really focused on your business logic.

Starting point is 00:22:42 So there's some flavors of these solutions as well. And I think if I just take a step back on the left side of this slide, you have StreamCloud and Stream Platform. So the Stream Platform, you can either run on-premise or in a self-managed cloud on all three marketplaces,

Starting point is 00:23:00 the popular ones, Google, Amazon, AWS, and Azure. StreamCloud is a fully managed SaaS solution. So this is where we run everything for you. And I mentioned the largest airline in the world is running this thing with us. And then we have also introduced a set of newer products that we call our data products. So these are very purpose-built, where the idea is that the end user already knows that I want to actually go ahead and use Stream to move data for BigQuery analytics. And in this specific offering, the user experience

Starting point is 00:23:37 is very different. It's more intent-oriented. And you can't just do everything under the sun. So in Stream Cloud, for example, I could take data from an Oracle and Postgres system as data sources, and I could take different tables using change in a capture, one writing to a Kafka topic,

Starting point is 00:23:57 one writing to a Snowflake, one writing to just like Amazon S3. Those capabilities of like, hey, I want to have full power and do whatever I want in a real-time integration pipeline, you don't get all of those capabilities in our data products. In our data products, we know that you want to do your analytics on BigQuery and you want real-time data.

Starting point is 00:24:18 So the user experience is actually pretty simple and very tight. It just pretty much invites you to say, give us your BigQuery credentials, what are your sources? And it automates the entire thing for you. And that is the big launch that we just did on BigQuery. We are just this week going to be announcing that as preview for Snowflake.

Starting point is 00:24:43 And then also to be soon followed by Azure Synapse and Databricks. So that's the direction we're going in. This is feedback from our customers where they also want to have very purpose-oriented data products so that, you know, we don't give them the entire, you know, flexibility. I mean, some customers want it, particularly large enterprises. But if we've seen small to medium teams that don't want to actually go ahead

Starting point is 00:25:07 and understand everything, they're like, well, just take me to a simple set of UI screens and let me just configure it. And then off you go. And we automate the entire schema creation, the initial historical load, the change data capture, the alerting, the monitoring. And then all you do is sit back and just,

Starting point is 00:25:24 this runs in the background for you. Does that make sense? Yeah, it does. It also brings me to another question. So now that you have touched upon the different flavors in which people can use the Stream platform, the actual news that you're about to announce is that you're expanding one of those offerings, actually. I think it's the one that's managed by Stream, right? So you are now making it available also on AWS Cloud. And I think it was previously available on Google Cloud. And was it also some other? Was it also Azure?

Starting point is 00:26:02 Yeah, it was. So I think you're taking the, you're doing things a little bit backwards in a way. So most companies would start with AWS and then move from there. Was there a particular reason why you did it differently? Yes, yes, there was a very good question. I saw that in the briefing notes as well. So, you know, one of the things is that, you know, what's really driving this market, right, is the cloud data integration for analytics market, right? We see a lot of pull when people are trying to actually go in and, you know, do analytics in the cloud for, let's say, on Azure Synapse or on Databricks or on, you know, maybe BigQuery and Snowflake and so forth.

Starting point is 00:26:52 And, you know, the other thing is our change data capture differentiator is huge. So if you take a look at Azure, right, they have a huge install base of databases in SQL Server, right? If you take a look at, you know, both Google and Azure, we work with those teams very closely because they were super interested in getting real-time data into their successful offerings like BigQuery and Azure into Postgres there because they already have a database to offer. So conscious. And the other thing, the second point there is that our focus was more on enterprises. And we saw that AWS was having a lot of, the target market was more, maybe not necessarily the very large ones when it came to data management. Those were more in the other two CSPs. So we started there.

Starting point is 00:27:40 But at the same time, once we had enough of a critical base, now we are moving to AWS on a managed service. On AWS, we always offered our stream platform to be deployed by you. In fact, we have customers like UPS and all these guys who actually do use it for a long time. And now is the first time we're doing a managed service there. So that was the reason for that. Okay, I see. So I gather it has probably a lot to do with the fact that you mentioned earlier that you have been working in closer collaboration with Google and the Microsoft team.

Starting point is 00:28:16 And so you mentioned, for example, the partnership you have around BigQuery. And so it was kind of a business-driven decision, let's say. Absolutely, Absolutely. I think most of the customers that we were seeing on our on-prem poll were trying to get data into these two CSPs first. And they were really naturally telling us.

Starting point is 00:28:38 And our partnerships with these guys, like I mentioned, were commercial in nature. And that's something newer on the AWS side. In fact, we are at reInvent this week. And with these two other, with both Google and Azure, they've been existing relationships for many years. So we've been closely collaborating with their product teams. And that's the reason why we offered our services there first. I know we are up at 8.30, my last two slides. So this is what's upcoming, George. I kind of talked about it. One thing that I didn't mention is our stream developer version. So that's

Starting point is 00:29:13 actually going to be previewed very shortly, as early as this year, and then also going GA in Q1. So that's actually a premium offering that lets you get your hands with a platform. And this is really to target it towards the developer community where they don't want to necessarily go out and be concerned with payments and so forth. So it has a few swim lanes, but it actually gives you a great way to interact and start using the product. So we are hoping that that's going to be a very good, you know, exposure for Stream to the broader developer community. We're also coming out with application connectors.

Starting point is 00:29:50 So today we have Salesforce and a few others, but we are broadening that suite of application connectors based on our customer feedback. Okay, great. Thanks. Well, that was in my list of questions anyway. I wanted to ask what's in your roadmap following this announcement that you're about to make. And the other thing I wanted to ask was, well, your take on the broader landscape, let's say, of real-time data or streaming or whatever it is you want to call it. So up until recently, the analysts have been quite optimistic about the outlook for the market. And well, as you also mentioned in the introduction, it's sort of the new paradigm, let's say,

Starting point is 00:30:36 where people want to be able to respond to changes and events that happen in real time and therefore platforms such as yours. However, the economic climate is not the same as it used to be when those forecasts were originally made. And we've seen a number of layoffs in the tech industry recently and a general downturn in the economy. Do you think that these forecasts are still valid?

Starting point is 00:31:08 Do you see some kind of a special way, let's say, that this economic climate will impact your industry and your domain as well? Yeah, I mean, good question, George. And again, I wish, you know, there was a yes to no Boolean type of an answer. But I can give you my opinion. I think in general, what we have seen is because we tend to stream particularly tends to play more in the mission critical landscape. We are right in the heart of things because this is a continuous pipeline.

Starting point is 00:31:39 So if anything, when the broader market sentiment is what we are seeing with maybe some of the things in the industry slowing down, I think technology, particularly real-time technology is becoming more and more prevalent because we see our customers so accelerating their cloud modernization initiatives because I think that's how they actually save cost. Nobody wants to manage their infrastructure and so forth. So has it impacted us? Maybe a little bit where people have slowed down their decision-making process and they're taking more

Starting point is 00:32:14 time. So that I think we've seen some glimpses of that. But at the same time, I think if I just go maybe a 12 to 24-month horizon, I'm not that worried about that. I think we are seeing, at least for the real time parts of it, because, you know, 80, 90% of the ETL is still homegrown and on-prem and, you know, poorly executed through these scripts that often fail. You know, that's a very labor-oriented and cumbersome business. So a lot of people are simply trying to get rid of that. And that's where, you know, the a very labor-oriented and cumbersome business so a lot of people are simply trying to get rid of that and that's where you know the modern platforms which allow you to do real-time data integration particularly in the background so that it's continuous and you're not dealing with discrete things that are you know up and down and failing all the time i think uh

Starting point is 00:32:57 there's a there's a general trend in in spend in that layer so i think that's how how we actually view it and we think real time is definitely, you know, it's impossible. I mean, people who are under 30 get very confused when they get messages, you know, a day later when they're intending to, you know, browse for something or make a purchase, you know, they just don't simply relate to it. So I think that trend is definitely here to stay. And that actually speaks very well for us. I hope you enjoyed the podcast. If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn and Facebook.

Orchestrate all the Things - Cloud modernization and real-time data is how you cut down costs during downturns according to Striim. Featuring Alok Pareek, Striim Co-founder and EVP of products

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.