Orchestrate all the Things - Cloud modernization and real-time data is how you cut down costs during downturns according to Striim. Featuring Alok Pareek, Striim Co-founder and EVP of products
Episode Date: November 30, 2022Can technology, and real-time technology in particular, help companies achieve savings during economic hardship? Alok Pareek thinks it can. Pareek is the Co-founder and EVP of products of Strii...m, a vendor whose goal and motto is to "help companies make data useful the instant it’s born". Depending on which angle you look at it, you could say that Pareek is either biased or in the know. Either way, it was not so long ago that real-time data, or streaming data as this market is also called, was estimated to be worth billions. But then again, as the recent wave of layoffs and market capitalization losses goes to show, many projections around technology are off the mark. Could real-time data be different? Where does cloud modernization come into play and how does Striim's offering relate to that? As Striim today announced the availability of its fully managed Striim Cloud service on Amazon Web Services (AWS), we connected with Pareek to discuss.
Transcript
Discussion (0)
Welcome to the Orchestrate All the Things podcast.
I'm George Amadiotis and we'll be connecting the dots together.
Can technology, and real-time technology in particular,
help companies achieve savings during economic hardship?
Alok Parikh thinks it can.
Parikh is the co-founder and EVP of Products of Stream,
a vendor whose goal and motto is to help companies make data useful
the instant it's born.
Depending on which angle you look at it, you could say that Parik is either biased or in the know.
Either way, it wasn't so long ago that real-time data, or streaming data as this market is also called,
was estimated to be worth billions.
But then again, as the recent wave of layoffs and market capitalization losses goes to show,
many projections around
technology are off the mark. Could real-time data be different? Where does cloud modernization come
into play and how does Stream's offering relate to that? As Stream today announced the availability
of its fully managed Stream cloud service on Amazon Web Services, we connected with Parik to discuss.
I hope you will enjoy the podcast.
If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn, and Facebook.
I'm one of the founders in Stream, and I run all of the product areas, including engineering
and product management and strategy.
And by way of background, I used to be the chief technology officer of a company called Golden Gate Software, which, are you familiar with Golden Gate, George? background. I came across some text that mentioned that, you know, said a few words about the
founders and said that it was founded by people who had done you know, such and such and such
and also including having founded and sold WebLogic to the BEA. And I got a bit, you
know, a little bit touched by that because WebLogic was a technology I used to work very
extensively with and I have
good experience of doing so. So I looked it up a bit further and it turned out that it wasn't you
that did that actually, you have a different kind of background, but still it was an interesting
touch for me. Yeah, my co-founder, it was interesting, my co-founder Ali Kotei, he was
the CEO of WebLogic and then through the BEA acquisition.
And then ultimately, Oracle ended up acquiring that.
And Oracle also ended up acquiring GoldenGate.
So it was interesting.
So most of Oracle's middleware, at least these are two pretty significant products in their
product portfolio with easily over billions of dollars in revenue
on an annual basis, actually, if you're taking a look at maintenance revenue now.
In terms of background, I spent a number of years at Oracle,
primarily the database development team, working on features all the way from Oracle.
Gosh, when I joined, it was 6036.
By the time I left, it was Oracle 11g.
So a lot of my focus was on recovery and high-speed data movement in the Oracle database
kernel itself. And several years post-acquisition, running Oracle's data integration product
portfolio that included the data integration product, the replication product, as well as
data quality products that we had
acquired at the workroom.
Okay, so that's kind of my background.
And let me just quickly get started.
Like I said, I might be referring to some futures.
So just very briefly, with our focus at Stream, and we've been doing this thing for close
to 10 years now.
So we want to power all operations
and decisions in real time.
Real time is our focus.
And we really want to make sure
that companies and enterprises can connect
pretty much with their partners,
with their suppliers, with their customers,
via data flows for the digital economy.
And we aim to make sure that the data is flowing in real time.
And so with that, our product vision is to deliver a unified data integration
and streaming service to the market.
And by unified, we mean it's a single platform that brings together both the real-time data integration
tenets, as well as all of the streaming analytics capabilities.
So we do have a full-blown query processing layer that allows you to operate on the data
in flight.
And why do we want to do that?
It's to get data from, it's to get from data to decisions in real time.
If you took a look at a data management landscape today,
you'd probably end up putting together three, four,
or five different products to address what Stream does.
And so our core backbone allows this to be done in an easy to easy ease of use fashion easily manageable fashion
so you don't have to actually struggle with you know deploying different technologies and bringing
it all together yourself um just in terms of overall um you know we've been very successful
and um you know in multiple different uh areas um but be it logistics or travel,
customer loyalty in retail, supply chain in retail.
And to the point, and it's great that you have a background
with these such large companies
in the Fortune 500s and the like.
So we really are running the state of the art
digital services for, in multiple mission-critical
environments. And I'll talk about that a little bit more. And then the team comes in, like you
are aware, from WebLogic, GoldenGate, Oracle. And our most recent round was our Series C round
that was led by Goldman Sachs. And we also have several other prominent investors,
as you can see. And we have pretty strategic partnerships with Microsoft and Google,
where there are actual agreements with these guys for their customers on several data
modernization projects. And then we also have partnerships with AWS Databicks, and Snowflake. Just an eye chart of our customers.
Obviously, like I mentioned, this is a horizontal technology.
It cuts through financial services, healthcare, transportation, logistics, retail, high-tech,
telco.
So pretty much a wide range of use cases.
And with the use cases, we can think of them in two different ways.
One is where you have the data architects and the ETL developers and data engineers
who are trying to build the foundation, the infrastructure layer,
to make sure that data can be easily made available for consumption across your ecosystem
in real time.
And the real-time part is important because that differentiates us from a lot of the batch
oriented technologies such as Informatica and Talent and all of the products in that
space.
So with that in mind, the key use cases here tend to be cloud data integration, data modernization, trying to put together a customer 360, you know, potentially in like an analytic system.
Many folks are looking for just a real time change in a capture technology, trying to do streaming analytics, streaming ETL, real-time analytics. And this is sort of something that,
as we look across our customer base,
this is how they are using us.
So it's not something that we are going out and saying,
hey, please use us this way.
But because it's a platform
and it has a breadth of functional areas within the product,
these are the common use cases for Stream. So this is more of a technical-oriented use case.
The second one is more of a business user, application user-oriented use case,
where application developers or BI analysts might actually come in and they are trying to deliver,
you know, some value to the business, like, you know, improve customer experience,
you know, better patient care, trying to make sure that across your multiple fraud detection systems, there's
still no gaps and trying to make sure that, you know, you can, you know, look at a cross
spectrum of different products as data is coming around in real time to add further
logic there.
And the list goes on in terms of fleet cargo
and parts management.
We have one of the largest airlines in the world
as our managed services customer
on the cargo and the part side.
Real-time dynamic pricing,
marketing promotions, omni-channel.
So I'll get into maybe one or two specific use cases later on,
but just wanted to make sure
that these are sort of the core use cases of how we go to market and then you know what we're capitalizing on are these
six architectural shifts uh this is something that came from mckenzie digital um you know a year and
a half ago um they're very well aligned with uh what we see in the market, trying to make sure that you have a shift from on-premise to cloud-based
data platforms, from batch to real-time data processing, from pre-integrated commercial
solutions to best-of-breed platforms. We see many of the teams trying to actually look at
architectures like data mesh or data fabric, where they're trying to get from point to point to decouple data access. Also, you know, from the 2015, 2016,
2017 era, where people were sort of, you know, busy building, you know, a Hadoop-based data lake,
trying to actually move away from that a little bit towards more of a domain-based architecture,
where there are purpose-built applications, which be called data products that are being housed by the domain
folks who best understand the domain and the data um and then you try to go from rigid data models
to more open formats like json and parquet and delta and so forth so we find ourselves in the
middle of these architectural shifts and we are the enablers here,
you know, as we help our customers.
So a conceptual, simple way to look at this is,
you know, a majority of the businesses today
might have, you know, their applications running,
you know, using a number of different areas
in the stack on premise,
or maybe, you know, more recently in the cloud.
And we help move the data across these applications
and databases and Kafka queues,
or even data coming over the web
to these different endpoints,
cross cloud, multi-cloud in real time.
And that's really how we help enable
a key piece of the digital transformation.
So I'll maybe stop here just for a few seconds
just to make sure at least at the high level,
the picture is sort of clear as to where we fit in, George.
Yeah, it is pretty clear.
And actually, to be honest with you,
that aligns with the understanding I had already
kind of looking up what Stream does.
And even by name, it was pretty clear that you're in the real-time streaming data space.
What I was more interested in, to be honest with you, was, well, what precisely your differentiation
in that space is, let's say, compared to all the other options?
Because as you obviously know,
there's a number of other options in that same domain as well.
So let's try and zero in on that one.
So I would start by asking you,
so do you actually cover both data transfer and data transformation?
So can people do data transformation using Stream?
Yeah, great question.
And I'm just going to land right on the platform itself.
And this will actually show you the various capabilities, as well as I'll point out some
of the unique things that we have done, which sort of take us away from many of the traditional
players in this market.
So let's actually start with the data sources. So as you take a look at
it, we do tend to support hundreds of different data sources. And these could be databases,
log files, messaging systems like Kafka or IBM MQ or JMS-based systems. Data coming in
from sensors, data coming in from just over the web.
So in a straightforward, you know,
data integration type of a scenario,
you know, what we do is we do continuous data collection as opposed to sort of batch data collection.
And that's a key piece.
I'll just get into that in a second.
And this technology is called CDC.
So there are a few players in this market.
Clearly, you know, there's Oracle GoldenGate, which was done by us, and then now housed at Oracle.
IBM has part of the Infosphere family, where they had acquired a Canadian company called
Data Mirror, which addresses that space.
And there's a few smaller players that we had seen a while ago, which constitute the
remainders.
One is now owned by Click.
One is now owned by Fivetran.
So we're sort of an independent, you know, standalone company that actually provides this.
In the simpler use case, we simply deliver, you know,
these change streams onto a number of different target systems.
So, for example, a MongoDB to Azure Event Hubs, or a MongoDB to Snowflake, or an Oracle to Google BigQuery, these might be simple, straight through real-time data integrations where you're just trying to make sure that what happens as the data is moving? And here's where the capability of the platform can be leveraged to do data filtering, to do data aggregation, transformation, and enrichment. technology out there today that can introduce streaming. So for example, let's take Confluent or let's take Apache Flink or any kind of a streaming system like Spark streaming. Usually
they do not have the capability, the CDC capability. So they would still end up using
stream to move data from, and many of you have many customers doing this. If you have, let's say,
data coming in from a CRM system or a Salesforce application, getting that into just Kafka, you would actually use Stream. And then beyond that, you might use Confluent,
which is actually supporting the Kafka platform or the Kafka cluster itself. So we actually bring,
now, once we bring the data in, it's your choice because we do offer all of the SQL capabilities
for doing continuous query processing, right?
So this is where, you know,
you don't have to now put together your own,
you know, streaming layer,
which has to do with the processing part of it.
And this is a key separation.
I find in the industry,
the most common confusion is a lot of people think
that streaming ingestion, right?
If you do that with a backbone, like, you know,
like Kafka or Apache Pulsar,
then you are done with streaming. But there's also the stream processing part of it. And there's also
the in and out pieces of it. Like, why are you actually doing it? For whose purpose? Where are
you getting the data from? Where are you delivering the data to? So the unique thing that we are doing
is we have taken the endpoints and we have fused that together to make sure that
all of the streaming capability that you need from, you know, ingestion, particularly from
change data capture set of data sources, which otherwise are very difficult to consume data from
on a continuous basis. You cannot just keep pulling database tables all the time because,
you know, operational concerns. So from streaming ingestion and CDC to stream processing,
to stream storage, to stream analytics,
and then finally to actually stream visualization
and delivery, we are taking all of these different
capabilities in the streaming system.
And we have a comprehensive platform
that addresses all of these.
And that's sort of like what's different.
That's why we call it unified.
And I'll give you an example of enrichment. Let's say you wanted to actually join reference data where, you know, as data is coming through,
you know, from, you know, in a retail environment, for example, you're coming in from a digital
channel. Let's say you're coming to a website and then you place an order and the order management
system is another application that could be running on IBM mainframe. Getting these two events together to,
interestingly either spot patterns
or trying to alert somebody
because this is a high profile customer,
you quickly want to go ahead
and cross correlate that in real time.
And that's where this enrichment piece comes in.
You have reference data that you can preload
and you can actually combine that.
And as you go through, right,
you are able to do continuous queries and windowing.
And the continuous queries have to do with the fact that
I can look at data as it's coming in every minute
or every 15 minutes, every five minutes
and have a set of queries that are specified.
So this is a push-based notification to me
as opposed to me polling, right?
And I wanna be careful about these two use cases.
One of them is I move data in real time to, let's say, Google BigQuery, and I run a set
of applications using queries on Google.
The second is, as I'm moving it, the data, using stream to Google BigQuery, I can define
data windows of one minute, and I can see, hey, just give me sales from a specific store,
if I have hundreds of stores all over North America. And then if the sales are underwhelming,
then go alert somebody in real time before it even makes it to Google BigQuery. So that's a
very different paradigm and we are bringing that into the integration landscape here.
And then the more advanced use cases tend to be where you could actually do multi-stream
correlation. You could do pattern matching. So windows are obviously either time-based or they
are batch-based. So you could say that maybe every hour do this for me or every 100 events do this
for me. You could also define pattern-based windows where you say, hey, my window starts
and I have a specific punctuation,
where I have a specific pattern starting with a keyword, and then it also ends with a pattern.
And then that becomes your window because you want to analyze things there.
And that's super useful for doing session management and so forth in real time.
So we've added that piece into the integration pipeline. And one way to consume this is through very simple use cases, like I mentioned, real-time
alerting, real-time triggers, expressing real-time ad hoc queries, building real-time dashboards,
and then incorporating your own machine learning and AI.
Now, that's a lot, but I want to be careful that the way we are coming
to market is we're saying for the real-time pieces of the integration and analytics,
Stream is the platform. For your long-term batch-oriented analytics, where scale is needed,
scale as in I may want to actually go ahead and run a query against one year's worth of data,
that's where now you're moving outside of Stream into systems like Snowflake and BigQuery and so forth.
Does that make sense?
Yes, there's something else that I wanted to ask you specifically about.
So you did mention that the platform does offer a SQL interface as well. At the same time, you also mentioned different options that people have for defining pattern-based matching or time-based matching and so on.
Are those two combined at all in any way?
Or if not, do you have specific SQL extensions that people can use for that?
And how can people find the exact patterns
that they want to match?
Yeah, fantastic.
Great question, George, by the way.
So we do have a variant of SQL
and then we have introduced our own extensions to it.
So for example, to the specific question you asked
about how do you do pattern matching?
So we actually have a function, right, which is really a pattern matching operator. And then,
you know, you can actually invoke that within as a UDF in SQL, and then you actually pass in a
regular expression based on the event. And that looks for that specific regular expression within
the event itself. Okay, that's how you actually go ahead and express that logic if there's something that's
difficult to express in sql then um you know you can actually link in with the you can extend the
platform you can link in with your own uh this is a primarily a distributed java-based platform
so you can actually link in with your own code as well so we have an interface we have a component
called open processor and then where we actually publish
a set of interfaces for initialization and then runtime and so forth. And then you can actually
get the result back into the platform. And that's how you could actually write your own ML, for
example, or if you can write your own data cleansing rules on the data, for example, or you
could actually do your own auditing with third-party systems within that layer, for example.
I'm citing real examples of what our customers
end up doing with that open processor.
Okay, so the Platon does offer an API,
I guess, that people can use.
However, if I got it right, it's not open source, right?
So it's proprietary and it comes into flavors, basically,
so people can use it on premise or they can use the self-managed cloud version or they
can use the fully managed cloud platform that you offer, which is actually the occasion
for having today's conversation.
So let's focus a little bit on that.
Sure.
Great, great question.
I love the flow.
It's just pretty much in line with what I thought we would want to talk about. So the product itself is offered either as a platform
where you can download it and you can run it on your data center, or you can host it yourself.
So we are available in the marketplace where we run in our customers' account.
And finally, in our managed services, that's where we run it in our account.
And then you basically don't have to worry about any of the manageability, monitoring,
batching, upgrading, all of the stuff that you leave it up to us, and you're really focused
on your business logic.
So there's some flavors of these solutions as well.
And I think if I just take a step back
on the left side of this slide,
you have StreamCloud and Stream Platform.
So the Stream Platform,
you can either run on-premise
or in a self-managed cloud
on all three marketplaces,
the popular ones,
Google, Amazon, AWS, and Azure.
StreamCloud is a fully managed SaaS solution. So this is where we run everything for you. And I mentioned the largest airline in
the world is running this thing with us. And then we have also introduced a set of
newer products that we call our data products. So these are very purpose-built, where the idea is
that the end user already knows that I want to actually go
ahead and use Stream to move data for BigQuery analytics.
And in this specific offering, the user experience
is very different.
It's more intent-oriented.
And you can't just do everything under the sun.
So in Stream Cloud, for example,
I could take data from an Oracle and Postgres system as data sources,
and I could take different tables
using change in a capture,
one writing to a Kafka topic,
one writing to a Snowflake,
one writing to just like Amazon S3.
Those capabilities of like,
hey, I want to have full power and do whatever
I want in a real-time integration pipeline, you don't get all of those capabilities in
our data products.
In our data products, we know that you want to do your analytics on BigQuery and you want
real-time data.
So the user experience is actually pretty simple and very tight.
It just pretty much invites you to say,
give us your BigQuery credentials,
what are your sources?
And it automates the entire thing for you.
And that is the big launch that we just did on BigQuery.
We are just this week going to be announcing that
as preview for Snowflake.
And then also to be soon followed by Azure Synapse and Databricks.
So that's the direction we're going in.
This is feedback from our customers where they also want to have
very purpose-oriented data products so that, you know,
we don't give them the entire, you know, flexibility.
I mean, some customers want it, particularly large enterprises.
But if we've seen small to medium teams
that don't want to actually go ahead
and understand everything, they're like,
well, just take me to a simple set of UI screens
and let me just configure it.
And then off you go.
And we automate the entire schema creation,
the initial historical load, the change data capture,
the alerting, the monitoring.
And then all you do is sit back and just,
this runs in the background for you. Does that make sense? Yeah, it does. It also brings me to
another question. So now that you have touched upon the different flavors in which people can
use the Stream platform, the actual news that you're about to announce is that you're expanding one of those offerings, actually.
I think it's the one that's managed by Stream, right?
So you are now making it available also on AWS Cloud.
And I think it was previously available on Google Cloud.
And was it also some other?
Was it also Azure?
Yeah, it was.
So I think you're taking the, you're doing things a little bit backwards in a way.
So most companies would start with AWS and then move from there.
Was there a particular reason why you did it differently?
Yes, yes, there was a very good question.
I saw that in the briefing notes as well.
So, you know, one of the things is that, you know, what's really driving this market, right, is the cloud data integration for analytics market, right?
We see a lot of pull when people are trying to actually go in and, you know, do analytics in the cloud for, let's say, on Azure Synapse or on Databricks or on, you know, maybe BigQuery and Snowflake and so forth.
And, you know, the other thing is our change data capture differentiator is huge. So if you take a look at Azure, right, they have a huge install base of databases in SQL Server, right? If you
take a look at, you know, both Google and Azure, we work with those teams very closely because they were
super interested in getting real-time data into their successful offerings like BigQuery and
Azure into Postgres there because they already have a database to offer. So conscious. And the
other thing, the second point there is that our focus was more on enterprises.
And we saw that AWS was having a lot of, the target market was more, maybe not necessarily the very large ones when it came to data management.
Those were more in the other two CSPs.
So we started there.
But at the same time, once we had enough of a critical base, now we are moving to
AWS on a managed service. On AWS, we always offered our stream platform to be deployed by you.
In fact, we have customers like UPS and all these guys who actually do use it for a long time. And
now is the first time we're doing a managed service there. So that was the reason for that.
Okay, I see. So I gather it has probably a lot to do with the fact
that you mentioned earlier
that you have been working in closer collaboration
with Google and the Microsoft team.
And so you mentioned, for example,
the partnership you have around BigQuery.
And so it was kind of a business-driven decision, let's say.
Absolutely, Absolutely.
I think most of the customers that we were seeing
on our on-prem poll were trying to get data
into these two CSPs first.
And they were really naturally telling us.
And our partnerships with these guys, like I mentioned,
were commercial in nature.
And that's something newer on the AWS side. In fact,
we are at reInvent this week. And with these two other, with both Google and Azure, they've been
existing relationships for many years. So we've been closely collaborating with their product
teams. And that's the reason why we offered our services there first.
I know we are up at 8.30, my last two slides. So this is what's upcoming, George. I
kind of talked about it. One thing that I didn't mention is our stream developer version. So that's
actually going to be previewed very shortly, as early as this year, and then also going GA in Q1.
So that's actually a premium offering that lets you get your hands with a platform. And this
is really to target it towards the developer community where they don't want to necessarily
go out and be concerned with payments and so forth. So it has a few swim lanes, but it actually
gives you a great way to interact and start using the product. So we are hoping that that's going
to be a very good, you know, exposure for
Stream to the broader developer community.
We're also coming out with application connectors.
So today we have Salesforce and a few others, but we are broadening that suite of
application connectors based on our customer feedback.
Okay, great. Thanks.
Well, that was in my list of questions anyway. I wanted to ask what's in your roadmap following this announcement that you're about to make.
And the other thing I wanted to ask was, well, your take on the broader landscape, let's say, of real-time data or streaming or whatever it is you want to call it. So up until recently, the analysts have been quite optimistic
about the outlook for the market.
And well, as you also mentioned in the introduction,
it's sort of the new paradigm, let's say,
where people want to be able to respond to changes
and events that happen in real time
and therefore platforms such as yours.
However, the economic climate is not the same as it used to be when those forecasts were
originally made.
And we've seen a number of layoffs in the tech industry recently and a general downturn
in the economy.
Do you think that these forecasts are still valid?
Do you see some kind of a special way, let's say, that this economic climate will impact
your industry and your domain as well?
Yeah, I mean, good question, George.
And again, I wish, you know, there was a yes to no Boolean type of an answer.
But I can give you my opinion.
I think in general, what we have seen is because we tend to stream particularly tends to play
more in the mission critical landscape.
We are right in the heart of things because this is a continuous pipeline.
So if anything, when the broader market sentiment is what we are seeing
with maybe some of the things in the industry slowing down,
I think technology, particularly real-time technology
is becoming more and more prevalent
because we see our customers
so accelerating their cloud modernization initiatives
because I think that's how they actually save cost. Nobody wants to manage their infrastructure and so forth. So has it impacted us? Maybe a
little bit where people have slowed down their decision-making process and they're taking more
time. So that I think we've seen some glimpses of that. But at the same time, I think if I just go
maybe a 12 to 24-month horizon, I'm not that worried about that. I think we are
seeing, at least for the real time parts of it, because, you know, 80, 90% of the ETL is still
homegrown and on-prem and, you know, poorly executed through these scripts that often fail.
You know, that's a very labor-oriented and cumbersome business. So a lot of people are
simply trying to get rid of that. And that's where, you know, the a very labor-oriented and cumbersome business so a lot of people are simply trying to get rid of that and that's where you know the modern platforms which allow you to
do real-time data integration particularly in the background so that it's continuous and you're not
dealing with discrete things that are you know up and down and failing all the time i think uh
there's a there's a general trend in in spend in that layer so i think that's how how we actually
view it and we think real time is definitely,
you know, it's impossible. I mean, people who are under 30 get very confused when they get messages,
you know, a day later when they're intending to, you know, browse for something or make a purchase, you know, they just don't simply relate to it. So I think that trend is definitely here
to stay. And that actually speaks very well for us. I hope you enjoyed the podcast.
If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn and Facebook.