In The Arena by TechArena - Cloud to Edge Data Observability with Calyptia’s Eduardo Silva Pereira
Episode Date: August 9, 2023TechArena host Allyson Klein chats with Calyptia co-founder and CEO Eduardo Silva Pereira regarding how his company grew from the origins of the Fluent Bit open source effort favored by the largest cl...oud providers to deliver a full featured commercial solution for data observability.
Transcript
Discussion (0)
Welcome to the Tech Arena, featuring authentic discussions between tech's leading innovators
and our host, Alison Klein.
Now let's step into the arena.
Welcome to the Tech Arena. My name is Alison Klein. And today I am really excited by our guest, Eduardo Silva Pereira, co-founder and CEO of internal news at the company and we are doing,
well, we had a lot of to talk in terms of the market, the company. And I think that this week
is to that because we have great news around the ecosystem, data processing, all the topics
that we're going to touch. So yeah, so happy to be here and be able to share this with you.
For those of our audience that listen to the Tech Arena,
they know that I'm a huge fan of observability.
So I was really excited to have you on the show.
Why don't we just start with an introduction of you and Calyptia
and why you decided to form this company?
Yeah, my name is Eduardo Silva Pereira.
I'm originally from Chile, now based in Costa Rica.
And I used to work for Oracle for seven years.
And the last time I was doing kernel patching,
operating system things.
And after that, I joined a startup
that was called Treasure Data,
where they were providing one of the first SaaS platform
to host Hadoop as a service, right?
And as part of that journey,
they created this agent,
multiple environments that was
called Fluendee.
I'm talking about 2011, 2012.
I joined that team in 2014, and Fluendee was a project that solved one specific problem.
And it's the same problem we have today, but at a different scale.
That at that time, I'm talking almost 10 years ago, different applications were
generating a ton of data in different formats.
And when you wanted to do data analysis, it was really hard to concentrate this data into
one place to analyze, run analytics, and extract the value.
And FluentD was one of the solutions that was created in open source with a plugable
architecture that could collect data from files, from Twitter feeds, from
firewalls, from all the places that you can imagine. And the community
built around, I don't know, a thousand plugins available to connect and send data
between multiple places. At the same time, when I joined that
team, I created this project called FluentBit because we knew that
at that time, we're talking about 2014, that IoT space was really hot at that moment.
Everybody was talking about 2020, your microwave will be talking to you and everything will be connected.
But all this software that will run on these devices will generate data.
So the problem that you got in the infrastructure now moves to your house, to your
home, to your office. And you will get the same challenge, right? How do you collect all this data
in a smooth way where you can run analytics at the end? Because that is your final goal.
And Fluentbit was created for that as a lightweight alternative to Fluentd, was part of the same
project. And both projects were donated around 2015, 16 to the CNCF when the foundation was
created right after Kubernetes was there.
People started using FluentD in Kubernetes to move data at scale from Kubernetes cluster
was pretty new.
And after that, there were certain challenges where people said, hey, it's really hard to
process data at a higher scale with FluentD because it has restrictions in terms of the language that is written in.
It was Ruby at that time and Fluent B was written in C.
So it becomes like an interesting alternative in terms of efficiency to process and move data at a higher scale by consuming less resources. And if you look at the history of generating data from applications, you realize
that data has been always a bottleneck. Not because of data itself, but dealing with data
has been a bottleneck. And today, if you look at every company is generating 20, 30% more data from
applications, system services, from hardware, and now it becomes a bottleneck.
So how do I process this at a higher scale?
And with time Fluentd and Fluentbit becomes like the default in the industry.
Now, if you go to any cloud provider, go to, I don't know, AWS, Microsoft Azure,
Google cloud, all of them run Fluentd or Fluentbit somehow.
So this is a vendor agnostic tool that moves data at scale in all infrastructure places.
And Fluent Bit, you can run it in a very small Raspberry Pi to a server that has, I don't
know, 96 CPU cores.
And one of the challenges around data processing is like first tools to move data were just
able to move data, collect from A, send it to B, and in B, you got the experience of analytics.
Now you can think about Splunk, you can think about Elasticsearch or any type of fancy database with analytics engine.
And when you start seeing that environments grow, there's more data to process, your destination, your backend where you run
analytics becomes to be the next bottleneck. Because the thing is that you just move the
problem from one place to the other. You just move the problem from the left to the right.
Now, now you realize, hey, one interesting approach would be what about if we move certain
analytics pieces that are today in the right to the left.
The left, people call it the edge where the data is being generated.
And the FluentD ecosystem and FluentBit also allows you to run queries,
allows you to enrich data, allows you to do any type of data transformation
or trigger alerts for when something is happening.
Usually, if you talk to any kind of user
that sends data to Loki or Splunk,
most of them are doing alerting.
Example, if the data contains this type of pattern,
like an error or warning,
do something, trigger a message,
ping somebody because some action needs to be done.
But usually that happens now in the database after you ingested all, I don't know, gigabytes
of data and you indexed that data and that takes a lot of time.
And then you realize, oh, there's something wrong.
Now with FluentDF, FluentBed, you can do this as soon as you collect data before the data
moves out of that box.
You can trigger an alert.
You can take some action.
And it could be an HTTP call, could be Slack, could be any type of solution.
So one of the biggest problems today, we have more data, more data.
And from a business side, more data does not correlate with more value.
Actually, it's totally opposite.
I love what you just said. I was thinking
about what you said about my microwave. It's still not talking to me, but many other things are. And
talking to companies, you know, they're overwhelmed with the amount of data and the various locations
that they're hosting that data and trying to seek value. So take me through your thought process as we look at these distributed
computing environments. How are you helping companies specifically? And you talked a little
bit about it in your first answer, but what do you think is the challenge in terms of data volume?
And what do you think the concept of compute gravity from cloud to edge means as it relates to tackling this big challenge of data volume and turning that data volume into value?
Yeah, that's a really good question.
I think that I would try to answer with different pieces here.
I think that there are different challenges associated with data volume. One of them is performance and for different companies, performance
could be a different thing.
Some of them is how fast I can query that data while others is
how fast I can move this data, right?
Everybody has its own metrics.
But then the common thing across all the environments is moving the data.
You have to move it, right? matrix, but then the common thing across all the environments is moving the data.
You have to move it.
Right.
And the moment that you move it, that is really expensive in terms of computing
cost, hardware, memory, network, CPU.
For example, you can have a car and everybody talks about how fast a car can be.
Right.
But nobody talks about how much fuel or gas you need.
Are you paying attention how much you need?
In data, people don't.
They just focus on, I need to go fast at any expense.
And then you have higher cost, your CPU,
your server starts to struggle and becomes a big problem.
The other thing is like with data, I think that from the user side, we found
with our customers that some of them don't know how much data they have.
Right.
They don't know also what, how much they have and what type of data it is.
Right.
And for example, we got one, this is from the company, Calipio side.
We have one big customer that played in the security
space they collect thousands and thousands
of firewall messages
and the moment that they deploy our technology
they say oh now
I'm not collecting I'm not optimizing
my environment now I'm collecting more data
than before yes because your
technology that you used to have it was not
able to scale now you're collecting more data but now you can take some decisions on that.
Do I need this data?
Yeah, if you cannot collect it, you don't know what you have.
In the user side, yeah, they are still finding this discovery process that they need to learn to identify what is useful for them and what is not useful.
And there's no pattern for all the customers, all the users.
Everything is different.
Every use case is different.
And we have seen some efforts in the market around
trying to automate this discovery process
with machine learning, time series forecasting.
And some of them somehow works,
but at some point, every user is different,
every use case and every business is different.
So it's really hard to get the right insights.
And the other thing with data volume is you start, all companies are turning into software companies.
Right?
That's a fact.
Everybody becomes a software company.
You get more software.
You have more data to process.
And then you got the next problem.
Yeah, analytics.
Also, your engineering teams are investing maybe wrongly
the time that they have to move the data
instead of have the right tools or the right partners
to take care of that.
Because when moving the data, it's a lot of complexity.
It's like network fails a lot.
What happens when the network goes down?
That generates back pressure.
You cannot move the data fast enough.
You have to store that data into the file system.
And at some point, if things go wrong, you have to sacrifice data.
And which data you have to sacrifice.
So there's a lot of decisions in the middle when moving this data at scale.
But if you think that you can take some decisions on the edge where the data has been generated,
right, you can reduce noise.
You can try to implement your own criteria to say, this data is not relevant for me.
Don't send it.
Report that it exists.
Yeah, that's fine.
So I don't know if that answered the main question that you had. Yeah, that's fine. So I don't know if that answered
the main question that you had.
Yeah, I think so.
You know, the other thing that I'm wondering is,
you know, you've got great traction
with the open source solutions.
Fluentbit and Fluentd are being utilized
by all of the major cloud service providers.
They've got broad ubiquity across the cloud.
What are you delivering
with your commercially available solution with
Calyptia that takes the solution even further? And what has the industry response been thus far?
Yeah, good question. One of the things that we discovered from our users that they value the most
from a product like Calypti Corp, which is our flagship product,
is the simplicity to integrate systems. It's not the same to say, deploy this product,
connect, create pipelines, integrate all your system in, I don't know, 10 minutes versus go configure, flow, embed, and take a couple of days between test and try.
That's incredible. Yeah.
I think that everything, even in life, is about simplification, right? And from this, it's a very low-level product for the infrastructure, right?
But if you look at the scope of people who need to consume data and are in the observability
space somehow, we consider that observability is just one part of what we cover as a company.
They just need to analyze the data.
They just need a fast way to enrich data or discard it.
And they want to remove the dependency of IT people
that can handle that for them.
Because it takes time.
And trust me, nobody likes to configure pipelines manually.
I would love to say, yeah, and oh, I love to use Chrome, babe,
because I can configure all this stuff.
Okay, yeah.
People love it because it solves the problem.
But that doesn't mean that it's an exciting experience.
I think exciting experience doesn't come with a configuration file,
a Unix file.
It comes with a nice UI that simplifies in two clicks,
I don't know, hours of work.
Now, Eduardo, you said that you had some exciting news this week
from the ecosystem and how you're engaging with partners.
Do you want to share that?
Yeah, of course.
For example, I suspected while you were in Intel for many years,
you were able to learn a bit of how was the old Linux experience for users.
I started using Linux around 99, right?
I think when I started college.
And the experience around 20 years ago was really raw, right?
If you want to use a modem or connect to a network,
you have to get a new kernel.
You have to compile it. You have, I a new kernel. You have to compile it.
You have, I don't know.
At that age, I used to have a lot of time, free time,
which I don't have now.
And with time, different solutions start to appear
right at Linux or to distributions
that simplify the experience
and make sure that you can use this without investing hours.
Now, in the cloud space today, if we fast forward 2014, 15, with the launch of Kubernetes, I started like the very old Linux 2.
The experience was very raw, right?
You have to learn a bunch of commands, learn about networking, storage, API, how things connect.
Yeah, it's really cool as an architectural point of view, but you have to
manage a lot of things manually. And then you have this new concept, same as in Linux, of distributions,
things that come with simplicity. I already had OpenChief, like Rancher, right? And they give you a great
experience on top of Kubernetes as a platform. So you're not using Kubernetes, you're using OpenChief
or using Rancher with a UI, everything looks good.
But now when you deploy your applications and you say, oh, great,
I'm using this platform, now I'm going to move to...
In Observability, you get the same experience that in
Linux 20 years ago, right?
And that is the honest truth today.
People, yeah, vendors can say whatever they want, but that's true.
People have a huge pain when they application now is time to monitor.
I don't not saying that there's no solution.
They are, but they are not vendor agnostic.
Yeah.
You had to get married with one vendor, right?
Right.
And this is where Calipri Core comes in to bring you the experience of that,
this observability type of distribution to manage your pipeline in a very easy way.
So you can focus on structured value and do what you want to accomplish at the end.
You want to analyze data.
You don't want to create pipelines.
You want to analyze your data.
Creating pipelines because that's the way to do it.
Right.
And we launched a couple of new features as a product, right?
It gets, it can be deployed inside Kubernetes or outside a standalone solution.
But one of the things that it solves is cluster logging.
So imagine that you have a Kubernetes cluster and you want to move the data from your nodes to a database.
Yeah, you deploy Flambed, you create a demo set, or you use the hand chart.
And that works. But the moment that your company starts segmenting
in the logic of how the platform is used with namespaces
or in OpenGV school projects,
they start to assign certain things to a namespace,
which is a logical segmentation of this in the cluster.
And those guys don't know anything else about other namespaces.
That's where they work.
That's what they know.
But if they are isolated, they still have this need of, oh, I need to have my own telemetry pipeline.
And the telemetry pipeline for this team is different than the other because both maybe have sent the data to different endpoints.
Splunk, Elastic, Kafka.
Every company uses multiple.
They don't use one.
Most of them.
And Caliptia Core now comes with a native support for these namespaces.
So you just get the product installed on Kubernetes and you just say, oh, I'm
using Kubernetes, enable cluster login, and now create pipelines for your namespace.
So you can give autonomy to your teams and you're not on IP operation folks.
And I think that's one of the major values that people or our customers are finding with this type of tool that it works.
It works on auto scale because it's hard to predict when you will have more volume of data, but this can auto scale or it can reconcile.
So it can auto heal if something goes wrong.
So these are gets about things that get a lot of value.
In the other side, we're just talking about moving data from A to B.
But I told you at the beginning that one of the big values
is to enable companies to bring the business logic into the pipeline.
One example, we have some company that they process transactions,
and those transactions, they
can block with a lot of credit card numbers, right?
This is our kind.
But these teams have very specific needs.
They say, oh, if the transaction or the credit card number belongs to American Express, Visa
or MasterCard, I have to take a different action for one of those.
And where are you going to take that action once you ingest the data after a database,
or you wanted to do it on the node, on the edge. And that is possible. So Calipta Core, we allow
to do all this processing, implement all this logic with a simple UI with a few clicks. And
if a few clicks are not enough, because it's a very custom logic you can write some lua
script it's for the lua language and if you don't know lua and this is where we're testing out we
have a simple interface with ai which you write natural language i want to do with this data xyz
and it would generate the processing rule for you. Nice, nice.
That really opens up to a lot of teams that may not have those technical skills.
When you were talking, you know, one of the things that I've talked about on the program
is that observability is such a key capability to the future of distributed computing.
And I'm sure that you have some goals with Calyptia.
Where do you see the company going in the next 18 months?
And what do you want to be known for as you continue to make progress in delivering solutions
and gaining customers?
Oh, great.
I love those type of questions.
I always consider that we have been lucky enough to be in a position and execute in innovation.
And we have traction of creating something that adds value to the market from the open source area.
And Fluent Bit happened the same.
Now, if you ask me what our goal with Calyptia, I think that it simplifies people's lives when it's about to strike value from data.
But to get to that point, we have to go from the bottom, right?
Solve the infrastructure problems, level up the user.
We start with an agent with a Unix file.
Now we're playing with auto-healing, auto-balancing.
Now you have a UI.
Now you have processing rules on top of that.
Now you have some kind of AI automation, if that is not enough.
So from a product perspective, we want that the user can achieve their goals in the most smooth way.
We want that Kalitia becomes Remind as a company, but we don't want to kill it, right?
If I'm talking 20 years in the future, when people think about Kalitia, hey, these guys solve everything around moving data at scale and nobody else has to think again how that works.
Because today, every company has to think, okay, how data movement works, what are the agents, what are the tooling that we need?
And we want to come up with a platform that abstracts everything. Now, same as the moment that people now see OpenShift
or see Kubernetes, they don't think of an API server,
they don't think of a worker node,
they just think of Kubernetes just like a platform
where you can deploy applications.
And that's it, magically things happen.
That's where we wanna go to with our products.
And the team that we have in Calipti is very interesting.
Most of us, we have a strong background in open source.
So we got people who were hacking different projects
from Canonical, I came from Fluent,
I used to have my other projects.
My co-founder, Anuel, came from Elasticsearch team too.
There is a lot of, I don't know,
we are a team of builders with a strong open source
background that we focus on sole problem and innovate on that.
I think we are lucky enough to get traction in open source and with customers now because
we are not in a position to, hey, we're chipping something to accomplish X, Y, Z.
We are redefining how things can be done in the future.
And I think that's what personally excites me a lot.
So it's like, we are not, we are defining the vision of how things should be done.
It could be right, could be wrong, customers and users will say.
And now from our investment perspective, from a company, one thing is products,
but also we want to, well, we just started other open source projects to
visualize data and add more value where we don't have the right tooling in the market today.
We have options, but I think that all options always can be better.
This is really exciting, Eduardo.
I love what you said about Linux and observability needing to grow up and be more user-friendly. And it sounds like you've assembled the right software team to get us there and to make
that much more widely deployable and usable.
One final question for you.
Where can folks find out more about you and the Calyptia team and stay in touch and engage
in deploying your solutions?
Yeah, great.
So we have, of course, social channels. We have our Twitter,
Instagram account, which is more informal communication. Now we are 100% committed to
improve the communities FluentBed and the community of Calitia. So we are people investing
to have a healthy community in FluentBed projects. So we have these bi-weekly calls with our users,
our companies using the technology,
but that's where we are wearing our OSS hat, right?
Hey, you got problems with FluentBit.
Maybe we can help and we have maintenance
from other companies like Google, AWS,
that everybody tried to find,
oh, I got this problem, I have this problem too,
or maybe we should work together
and come up with a solution.
That's the way that we define the future for that.
And for Calyptia also, we're going to launch our Carolina Rio head of marketing here in
the Gulf.
We will launch a new way to connect with our community through Slack and other channels
that we're going to be starting on maybe next week, where we can engage with the community
more in an open way. Because I think what we learned from open source and from many years ago,
actually, if you look at FluentD and you look at FluentBit,
it's not what SADA created FluentD or created FluentBit,
what it would get in mind, right?
We just built some foundations.
But after that, we got hundreds of people who have contributed with ideas with code.
And everything that you see in Fluent Bit today is thanks to the feedback loop that
we get with users.
And the same thing we plan to do with Calyptia.
We don't want to just push a solution that might not make sense because listening to
users, community, and customers is important.
Eduardo, thank you so much for being on the Tech Arena today.
It was a real pleasure.
I can't wait to see where Calyptia goes
and would love to have you back on the show sometime.
Yeah, thanks so much for the invitation.
I'll be happy to come back,
share about the news
or talk any type of things
around open source company technology.
So whatever you want, I'm here.
Thanks for joining the Tech Arena. Subscribe and engage at our website,
thetecharena.net. All content is copyright by the Tech Arena.