Orchestrate all the Things - From Raw Performance to Price Performance: A decade of Evolution at ScyllaDB. Featuring Felipe Mendes and Guilherme Nogueira
Episode Date: May 5, 2025In business, they say it takes ten years to become an overnight success. In technology, they say it takes ten years to build a file system. ScyllaDB is in the technology business, offering a dist...ributed NoSQL database that is monstrously fast and scalable. It turns out that it also takes ten years or more to build a successful database. This is something that Felipe Mendes and Guilherme Nogueira know well. Mendes and Nogueira are Technical Directors at ScyllaDB, working directly on the product as well as consulting clients. Recently, they presented some of the things they've been working on at ScyllaDB's Monster Scale Summit, and they shared their insights in an exclusive fireside chat. Read the article published on ScyllaDB's blog here: https://www.scylladb.com/2025/05/05/from-raw-performance-to-price-performance/ #NoSQL #Database #DatabaseEvolution #RaftProtocol #Cloud #DataConsistency #DatabaseScaling #TechInnovation #Opensource
Transcript
Discussion (0)
Welcome to Orchestrate All The Things.
I'm George Anatiotis and we'll be connecting the dots together.
Stories about technology, data, AI and media and how they flow into each other, saving our lives.
In business, they say it takes 10 years to become an overnight success.
In technology, they say it takes 10 years to build a file system.
CLDB is in the technology business, offering a distributed NoSQL database that
is monstrously fast and scalable.
It turns out that it also takes 10 years or more
to build a successful database.
This is something that Felipe Mendez and Guillermo Nogueira
know well.
Mendez and Nogueira are technical directors at Siladb,
working directly on the product, as well as consulting clients.
Recently, they presented some of the things they have been working on at CILADB's Monster Scale
Summit and they shared their insights in an exclusive Fireside Chat. I hope you will enjoy
this. If you like my work on orchestrating all the things, you can subscribe to my podcast,
available on all major platforms.
My self-published newsletter also syndicated on Substack, Hacker.me, Medium and D-Zone,
or follow and register all the things on your social media of choice.
So I'm Guilherme Noviere, I'm a technical director here at SILA and I've been a Solutions Architect
and a Macarier here, so I've got a lot of contact with prospects
that are really looking to SILA to learn more
and possibly use that for production.
And for that matter, SILA DB is a really fast
NoSQL database really aimed towards performance
and extreme scale that allows a work scale
to escalate to multiple millions of operations per second
at a really low tail latency.
As for me, I am Felipe Mendez.
Just like Guilherme, I also work at TulaDB as a technical director.
I've been here at the company for about four years now. Wow, time flies.
But I mean, as Guilherme said, we call him Gui,
just so you know. I love this database. I think its architecture is really beautiful. And we can
definitely discuss more about that, what really makes CillaDB unique on its own. Things like
its hard core architecture, its unique cache. But I mean, as for my time
here at SillaDB, I co-authored a book called Database Performance at Scale. I also contributed
code to SillaDB. So I tried to be pretty much involved in all the things in multiple areas
here inside of the company. Nice to meet you, George.
Great.
Well, thank you.
Thank you both for the introduction.
And I guess a little bit of background about myself,
contextual background maybe in order.
So I just realized just by browsing your recent event,
actually, that it's been 10 years that SillaDB has been
around. And out of those 10 years, I've been in one way or another involved, let's say,
or familiar with this database for eight of those years. So it was 2017 when I first became aware. It was also the time that I happened to meet Sila De Biss,
founder and CEO, or co-founder actually, Dorlaur. And so I've been kind of keeping track as much as
one can keep track, you know, on a kind of yearly, let's say, check, or sometimes I have to admit even less than yearly. But I'm fairly familiar with what SILA does,
and also its evolution over the years.
And I realized that because apparently of the fact
that it's been 10 years since its inception,
let's say, at your recent SILADB event,
there were a number of talks that were actually dedicated
precisely to this topic.
So the evolution of the database over the years and the major milestones.
And I think, Felipe, one of those talks was actually yours.
So you're probably the right person to ask you to give us just a bit of a review,
let's say, of CILA's evolution over the years
and what you think were the
major milestones along this course? Of course, I mean I think that's a really great question
and this was in fact the keynote that our CEO basically made. He made an analogy during his
keynote where he said that, well it takes 10 years for one to develop a database. Well, it turns out that it also takes 10 years and even more for one to basically build a database.
So as you said during my talk on Monster Scale Summit,
I also give a very quick overview over how I actually see the multiple Siladb generations.
And when SiladB first started,
it was all about raw performance, right?
We wanted to be, and we still are,
the fastest NoSQL database available in the market.
And we did that.
One analogy that I like to make
without going too deep into the weeds
is basically imagine that we were basically a
team of operating system engineers and we basically brought all this knowledge
on how to build really low-level high-performance stuff but to the realm
of databases and that's how CELA started. That's how ideas such as the shard per-core architecture
essentially began. And back at the time, we were basically able to break several records.
However, as you can imagine, simply raw speed does not really make it a good database. We needed to
be compatible. We needed to have compatible, we needed to have features,
things like materialized views, secondary indexes, we needed to have integrations with other
third-party solutions which are widely used in the market. And that's basically when we started
going throughout our second generation, where basically Siladb started to actually catch up
with the Cassandra protocol.
We call it CQL, the Cassandra Query Language,
and we started implementing all the technical depth
that we had.
A few years after, we eventually landed
into our third generation,
which basically marked our shift to the cloud.
That's when we announced that we're fully managed
Siladb cloud service, and we built several features
for customers to integrate their existing
Siladb deployments with their upstream systems,
things like Apache Kafka, Apache Spark, Elasticsearch.
That's when our change data capture was born. Another feature that many
people don't know about and Guy can perhaps talk about a little bit about it more later is basically
CillaDB Alternator. So we also basically have a fully compatible DynamoDB API, which is what we call Ciladb Alternator today. And the next generation of Ciladb
was basically when we started breaking apart from some of the deficiencies, that's how I would at
least refer to it, deficiencies existing in the Cassandra architecture, where basically we started introducing Raft and it's basically marketing to our road
to eventually get to a strongly consistent system. The next generation was basically,
it's basically where we currently stand at which is our road towards elasticity with tablets.
So lots of features, lots of very interesting things
for us to discuss.
I hope I cheesed some words, which unfortunately,
some of the audience may not know about it,
but I try to be very succinct.
Yes, indeed.
So thanks, thanks, Felipe.
And just to pick up on some of the things you said,
that again, since through my own involvement,
let's say, on superficial as it has been with the database,
I've been able to identify, let's say,
some of the themes over the years.
And one of the things that you mentioned,
so the CLA-DB Cloud has definitely been one of those.
And so evolution of CLA-DB Cloud and the fact that CLA-DB
has been increasingly used as a cache and all the migration paths from other data management
systems and databases. These were the key themes that we identified in the previous CLA-DB chat that
in the previous CLA-DB chat that we had with Dor.
And looking at this year, Monster Scale Summit, it seems that these are still there.
And some of the key themes that have always characterized
CLA-DB are also there.
So cost reduction, high availability,
scalability and elasticity.
I also noticed with interest that there are a few things
that are new such as the addition of vector capabilities and some things that I originally
thought they were new like the workload prioritization but it turns out that maybe they are not so
new after all. So I guess I'm going to ask you then next, Keith, any out of those themes that you'd like to pick
and just share a few words about.
Sure.
So in general, vector capabilities
is one thing that we're recently implementing and working on.
So this will actually match some of the Cassandra features,
but also extend it in a way that will be unique
to SILA in the future.
And the way that that happens is currently
we have the implemented data types, data structures,
and query capabilities of working with vector types.
Those are actually basic data types of the database.
And as we extend that project,
we also cover a service to do indexing and work on indexing for those data points in
vector. So vector search will be part of the future of SILA DB, especially in SILA Cloud, where we aim there to be a fully managed service,
not only for SILA, the base core database,
but also for this extended layer.
Okay.
And in terms of workload prioritization,
you also mentioned that it's seemingly new,
but it has actually gaining a lot of traction
because customers realize that they can run multiple workloads on the same cluster.
So even if they have concurring live, real-time, and analytics workloads,
they can run both of them in the same cluster, which is not the case with Cassandra,
that you can have a target for latency on your online and batch or analytics operations and still hit those
marks while saving a lot of infrastructure and licensing costs as you're running or
you're averaging the same hardware.
Indeed, and just to give a bit of my own perspective, let's say on this, the reason why I thought
originally that this is a new feature is, well, a couple of reasons, in fact.
First, the fact that I wasn't aware of it, so I thought, well, if I didn't know, it seems like a really useful feature,
so if I wasn't aware of it, it must be new. Plus the fact that there was a talk on this in the recent event,
again from Felipe, so I thought, well, this looks really cool.
I didn't know about it.
There's a talk, so it must be new.
So Felipe, do you want to clarify?
So when exactly was this introduced?
And is there a specific reason why
you chose to highlight it this year?
Is there maybe something new added
to the existing capabilities?
So thanks for your question, George.
Workload prioritization was introduced in 2019.
There are always new features and capabilities.
But I can't recall a finishing specific for this year.
But a few years back, probably last year or the year before,
we actually introduced a feature called Workload Characterization,
which basically extended
the previous workload prioritization capabilities.
The idea of characterizing a workload is that if we stop and think about it,
users basically run and use a database in a variety of ways. is that if you stop and think about it,
users basically run and use a database in a variety of ways, right?
But the main reason why most organizations
typically run CELADB is when they need very high throughput
and low latency workloads.
Yet, I mean, sometimes they may need to run
some very intensive batch workloads, some
sort of analytics, or they may have a user who is running ad hoc queries, either because
they're debugging something or doing any sort of data analysis on his own.
And what happens is that it's very challenging for the database, a database alone, to actually know how to prioritize
one query versus another, right?
And this is really the power that workload
partization gives to the user.
So you basically can assign different service levels,
that's the technical term we came to it.
And each service level is going to be isolated from another.
Which means that if you have a service level which has lower priority than your main workload
and your database is currently running under contention, meaning that either it's CPU or memory
or just resources are basically overutilized,
then CELADB will start prioritizing the workload
you defined for it to prioritize.
So it's a very interesting feature.
It's unique to CELADB. No other database has anything close
to that. And I believe the reason why we are calling out to this year is because basically,
as you said it yourself, you were not aware of it, right? So many users are still not aware of
all the unique CillaDB features that we have in comparison to other databases
and we every now and then want to call it out again so that we bring more awareness
to our users and community.
Okay, cool.
That's a great segue for me to ask.
There was quite a variety,
lots of themes in your lineup for the event you just had,
and lots of user presentations as well,
which is always good.
Do you have, I'm guessing that you must have watched them all
or at your own time, because they're also
available on demand.
So I'm going to ask you both,
starting from Yogi,
do you have a couple of favorite use cases?
I have one that is really close to my heart,
which is Mediums Feature Store Redesign.
I work closely with them in
DSA or Solutions Art Debt Capability,
and I help them discuss and remodel their data
set in order to achieve the results that they
presented during the event.
So that one, seeing that materialized
as a public presentation, even though I was not mentioned,
I was still greatly happy to see them on screen and
gather so much attention of how much
impact and positive impact a good data modeling and
reasoning about your access patterns makes for CLDB success.
I would also highlight Udmose,
which is another prospect that I work a bit close with, which migrated
from Dynamo into CLDB Alternator with extreme great success, reducing cost, improving latency,
and being able to be flexible on their vendors, on cloud vendors in general.
Okay.
Thank you.
How about you, Felipe? Well, that's... I could probably speak for an hour just on that.
But, I mean, before I actually talk about customers who actually spoke at our recent conference,
Monster Scale Summit, I would like to basically call out back on what Guy said,
because he basically mentioned about VJU's feature store.
And I work at, somehow closely with Clearview AI,
they gave a talk about how they are using Celerb
as an AI workload.
And that's very interesting because I see,
nowadays we have lots of new AI
and machine learning stuff coming up, right?
Another particular customer that I had the opportunity to speak along was TripAdvisor.
So last year at AWS re.Invent, TripAdvisor and I spoke at AWS re.Invent,
where TripAdvisor basically spoke how they use SiloDB to basically service
the real-time features. So imagine whenever you hit the TripAdvisor website, they quickly need to
figure out who you are because probably you use a TripAdvisor in the past, right? I don't know,
maybe you were planning a trip to go, I don't know, to Lisbon, Portugal, and you decided to book something with them. And then as soon as you
visit their website, they use Ciladb to quickly head through this data, identify who you are,
and then provide recommendations in real time on their website. So as you can imagine,
this has to be really fast because users attention spans
is not very high, right? So if it just takes, I don't know, 300 milliseconds, that are very high
chance they may lose the user. So they have a very interesting talk on how really Silladb helps them
to provide those real-time recommendations to their users. Now back to the main question, which is
recommendations to their users. Now back to the main question which is talks specific to our Monster Scale Summit. A particular customer talk that I also love was basically Discord's
on how they basically, well they had two talks right, how they basically run trillions of
searches at scale. This was how Discord basically manages
their Elasticsearch infrastructure.
But related to CillaDB closely is basically
how they run upgrades really at scale.
So I also had the opportunity to work really closely
with the Discord team.
I think they're super smart.
I always learn a lot of working with them. So if they are listening to this somehow guys, thank you so much
for everything you guys taught me throughout all those years.
But yeah, it's a very interesting talk and the way they basically
manage their distributed systems is really amazing.
Cool.
All right.
So I guess it's a good time to head back to one of the features,
or rather two of the features you talked about early in the conversation,
which I think, based on my understanding, is kind of core to the evolution of CLA-DB.
And they're also very, very new. These, we are certain that they are new.
So we're talking about tablets and strong consistency.
So tablets is a new data distribution algorithm
that in the latest CILADP version
is replacing the legacy VNodes approach
that was inherited from Cassandra.
And there's a strongly consistent topology
updates for metadata.
And I think before we get to discuss about what they are
and the motivation and some use cases,
I think it's useful if we start by introducing
a bit of background.
So first on the Paxos versus Raft protocols,
because this is central to understanding tablets
and also a little bit of background knowledge
on strong versus eventual consistency,
because again, this is kind of fundamental knowledge
to be able to follow this.
So who wants to pick what?
So with with regards to the Paxos and Raft comparison, the initial algorithm that was
used and inherited from Cassandra implemented for strongly consistent operations, data operations, rely on Paxos. And now SILA is not only applying these to topology
updates, but also to metadata updates,
meaning that whenever there is any change in metadata
or any topology change in the cluster,
that change is approved and committed
by the RAPT Distributed Consensus algorithm.
So in this case, we can do that fairly rapidly, consistently,
and really in a distributed scale.
So for instance, only on this aspect,
we can talk about tablets later, it
allows clusters to be scaled or joined in the nodes
to be joined in a cluster in mere seconds
rather than the previous distributed algorithm,
which was gossip for cluster topology,
which allowed only a single node
to join the cluster at a time.
So if you had to scale from three to six,
that is a linear operation of adding one additional node
three at a time from buckets of three.
So in that sense, when we change the strong consistency
for topology updates and also for metadata,
we can also now add three nodes instantaneously to the cluster
and then allow the cluster to rebalance itself using tablets,
which we'll touch later on.
Yeah, and with regards to your second question, which was the difference between
eventual consistency and struggle consistency, I would like rather to give you an example. So Guy mentioned about the gossip protocol. And the thing about gossip, and by the way, gossip is still used inside Scylla, but we are, as part of this raft effort,
which we are still going on with,
we are basically more and more relying less
and less in gossip.
So let me give you an example on how things work
before we actually had Raft.
So before Raft, if you were to basically bring down a node, suppose you are going to bring
down a node for changing a configuration, upgrading it, whatever.
You bring down a node and because Gossip is basically a pandemic protocol, as soon as a node goes down, not all nodes in the cluster are aware of the fact that this node went down immediately.
This information takes some time to propagate, right? So imagine you have a cluster which for Ciladb is very easy, I don't know, 100 nodes.
is very easy, I don't know, 100 nodes.
So you bring one node down, and then because Gosp is a pandemic protocol,
one node communicates with another,
and then this other node communicates with another,
and then this one node communicates with another.
So there are lots of HoundTrips going on out here,
and it takes time for this information
to disseminate throughout the cluster.
So it could happen a situation like this.
You shut down a node
because you were going to do a maintenance operation,
but another node took too long to recognize the fact
that the node was down.
As a result, simply because you shut down a node,
boom, your latency spikes, and you never understand why.
Until you realize that it's because it took too long for Gossip to actually disseminate that information.
With Raft, basically we have a state machine.
So Raft, we will have a single leader
and whenever those metadata changes
and state's change actually happen in the cluster,
this information is first sent to that state machine,
and then all nodes communicate with that state machine,
and they automatically will realize that,
okay, one node is down,
so we have two flagged,
this node down, and then your application can continue running
without taking it too long for a specific node or a group of nodes
to realize that something changed in your cluster.
That's basically a good way to differentiate or a group of nodes structure realize that something changed in your cluster.
So that's basically a good way to differentiate the eventually consistency from stronger consistency.
In a way or another, both systems will eventually converge
to the same state at the end of the day,
no matter how long it takes,
our systems will eventually realize
that one of the nodes went down.
The problem is, or the solution is, how fast can that realization actually came to be?
Okay, so I have a question. You mentioned in the second case, not the GOSIP, but using the Raft protocol that is a state machine,
and you have a leader that the rest of the nodes communicate with in order to get informed
about the up-to-date status of the cluster. So obviously, I'm going to ask you, all right,
so what happens if your leader goes down? I guess there must be some kind of resilience.
So by default, there's an election process. One difference between Paxos and Raft is that basically Paxos is a leaderless consensus protocol, right? So there is pretty much no leader,
but with Raft, you always have a single replica, which is considered to be a leader. So naturally you might realize that, okay,
isn't this a single point of failure?
Yes, it is.
But with CYLA-DB, basically we never have a single leader.
For example, with tablets specifically,
we have one leader per tablet.
And the idea of tablets,
and because I know we are eventually going to diverge
to discuss more in depth about tablets,
is that a tablet is basically a logical abstraction
that basically partitions your data,
your tables into smaller fragments.
And the name of those fragments,
we decided to call them tablets.
And those tablets, they are independent from the raft,
which means that CYLA-DB with raft can atomically
and in a strongly consistent way,
move them to other nodes as needed. And as your workload grows or shrinks on demand.
So basically we have lots of raft leaders, which means that if one fails you pretty much don't mind much.
An election will happen for those leaders which fail and you also have many other leaders in other nodes and replicas on a per-tablet basis.
Cool.
So we already started getting into the details about tablets,
so you quickly described what they are.
Can either of you give a little bit of motivation?
So what are some use cases in which tablets are useful?
So I'll get started, Phil. You can add more use cases later. cases in which tablets are useful?
So I'll get started, Fai, you can add more use cases later.
So the way tablets work by breaking down those token ranges into smaller and more manageable
and in a strongly consistent way is that we're able to really quickly move them out along
servers.
So this is really aimed towards fast scaling of a cluster.
So when I said that you can add three nodes really quickly
into a cluster, that also means that those three nodes join
the cluster and then rebalance these tablets
in a parallel and consistent way, but also extremely fast.
You can scale terabytes of data in
your minutes rather than hours,
which was the previous algorithm.
In this case, when we're talking about
machines that have higher capacity,
that also means that they have
a higher storage density to be used,
and tablets also balance out in a way that will
fill those disks in an even way. So all nodes in the cluster will have a similar
utilization because tablets who address the number of tablets within each node
according to the number of ECPUs which is always tied to storage in cloud nodes. So in this sense, as storage utilization is more flexible now,
so as we can scale more quickly,
it also allows users to run at a much higher storage utilization.
Philippe is showcasing here that our aim is to run at,
instead of 50 to 60%, up to 90 percent storage utilization
because tablets and automations in the cloud will also allow us to very rapidly scale the
cluster once those storage thresholds are exceeded. If you stop and think about it, I mean,
you must be crazy to run a database
close to 90% storage utilization, right?
But with Cilla, that's pretty much normal.
That's one of the benefits of tablets.
So what we are seeing here is exactly
as Guil was alluding to.
We first start, imagine that this diagonal purple line
is basically storage utilization.
We start with a small cluster with a single node.
This is basically per zone, so it does not really
reflect a real three node cluster, which
would be the minimum we recommend for production
purposes.
But the idea remains.
So imagine that your storage utilization grows over time.
And as you reach close to 90% storage utilization,
what we do is simply we add a new node.
And thanks to tablets, we can really stream that data very,
very quickly.
We, in fact, we have some demos showcasing how it works.
And we even run very many tests on a daily basis
using our Celeradb Cloud, which guarantee that
if you hit 90% storage utilization,
tablets will have enough time to actually scale out
your infrastructure without letting you run out of disk space.
Now, what is the value of this?
Why do I want to run my database
close to 90% storage utilization?
Well, many people say that nowadays
storage is actually cheap, right?
I mean, compared to memory utilization,
they are right.
Storage is definitely cheap.
But the problem is that it still costs something, right?
If you imagine, think about those hyperscalers.
Companies like Apple and Netflix where they basically have thousands of Cassandra nodes.
Very likely they are running their storage utilization somewhat
close to 50-70% utilization, which means that they have at least 30% per node disk space that
they don't use. Why? Because it's too dangerous, because it takes too long to scale. As Guy also said, I mean, one really benefit of tablets is that you can add several
nodes in parallel. So if you want, I don't know, to 10x your traffic, you simply go and you add
10x times the number of instances you have. All that and so on will handle all the rest for you.
In comparison with other databases, you typically have to run those operations serially.
You add one node, then you wait,
then you go and add another and then you wait,
and just takes time.
So with tablets, you can easily scale out
as well as scaling your infrastructure on demand,
which means that if you have a workload
which is very seasonal, for example,
throughout the day you have,
I don't know, to run your peak capacity,
but overnight you don't need that spare capacity,
you can simply scale your database up and down on demand.
And just at the end of the day,
results in even higher costs
beyond simply storage utilization.
So those are really the main benefits of tablets.
But there's a third one that I personally feel is also important.
Celeradb with tablets bring to users the ability that previously was only
available in cloud infrastructure. Think about it.
How many databases out there support this kind of elasticity that you can simply
install on your own deploy on on premises on your own facilities.
I can think only of Celeradb, other than the ones like Bigtable, Google Cloud Spanner,
AWS DynamoDB and so on.
So tablets also allow users to actually have this feature which was previously only tied
to cloud workloads, now to the hands of users.
Okay, so it seems like the main idea here
is to get your storage to be more granular basically.
And by doing that, you gain in flexibility.
However, usually, nothing comes for free
and there's always a trade-off.
So I'm guessing that the trade-off you make in this case is additional complexity in implementation.
I wonder if that translates to additional CPU usage, for example, at runtime.
Is it something that you have benchmarks?
So in terms of complexity, CLDB and the company CLDB drivers
that are compatible with tablets handle everything for the user.
So of course, that once the user is planning
a cluster and a table and some configs for the workload,
it has to have some considerations as to,
OK, how much data you expect to be in this table,
how many breakdowns or how many talking ranges
should that data be split at creation time.
But other than that, that's really
non-additional concerns that a user, an end user,
might have as the administrator cluster.
In terms of CPU utilization,
it is even much more efficient in the way that we're doing this now,
because as we discussed,
we can transfer a tablet from one node to the other as we're scaling.
Not only we can do that in terms of tablets,
but also the previous way was streaming data.
Roles were actually being read and streamed each one
in buckets over the networks.
And that was not as efficient as it could be.
And now tablets will also transfer a single file
through the network as it's scaling.
So now there is no row per row breakdown of the operations,
just a single single very bulk transfer that reaches
the maximum network capacity as it scales.
That's why it's extremely fast to scale
a cluster under these circumstances.
Felipe, do you have anything else to add?
Yeah. The feature you just mentioned is our zero-cop streaming,
which we called file-based tablet streaming, yeah.
But you do have a good point, George,
because once it turned out, tablet was a major effort.
And it still is, in a sense.
So for example, there are some features
that today we do not support with tablets,
but we are working really,
really hard to actually bring supports to those features. All those are basically available in our
documentation. But in general, the feature is stable, the performance is great,
and the benefits are many. So we have run some performance tests. We haven't seen any performance degradation of any sort.
But again, it's in a way or another, it's still a new feature.
So if there is any performance problem, you know how we basically handle those things here at CLEDB.
Then it's a bug and we got to fix it. Just to clarify, I wasn't thinking of complexity for the end user, let's say,
but more in terms of complexity in implementing the feature.
So for you, guys, the engineers in the crew basically,
and just out of curiosity, how hard was that to implement?
It definitely took some years to get it ready.
Basically, Toplets involved re-engineering critical paths of our database and many of
those critical paths.
For example, compactions, which is a very fundamental background process which exists both in Cassandra and Siladibi.
We needed to introduce a concept known as compaction groups to basically deal with those
tablets and ensure that we do not compact SS tables which are from a different tablet
incorrectly. Repair also needed to come with its own changes. As Guillermo said, our drivers
all needed to support what we call today as tablet awareness. So as you can imagine, when
a client application is going to communicate with the database, the most efficient request is when it hits a node that's also
a replica for the data that he's after.
So basically tablet awareness ensures that the driver always knows which node is a replica
for the tablet that he's querying against.
So those were outings that we needed to implement from ground zero.
And yeah, so it wasn't easy, but we finally got to a state
where we can finally say, hey, tablets is production ready folks.
Great. So I guess that means that it's already used in production by end users.
Correct.
OK.
And is it also, by the way, it's a good,
it just made me think, so what kind of feature parity
do you have comparing the on-premise version
to CLA-DB Cloud?
And the why I came up with this question is because I just thought,
well, all right, so is it also deployed on the SILADB Cloud?
So the way that SILADB Cloud is designed is to actually leverage the very same binary
down to the actual binary that you would run if you were running SILA Enterprise where you manage it yourself.
The only limitations of SILADB on the Cloud is that there's
few customizations in terms of
integration with the rest of one's infrastructure.
Such as, for instance,
the database deployment is isolated in a single VPC,
so there's no outward connection from the CLDB deployment out
into the customer VPCs or the customer networks. And then LDAP or any sort of network authentication
doesn't make a lot of sense in that scenario. So that's one of the features that are not
available in the cloud. Recently, we also introduced data encryption, so encryption with REST, as we call it.
And we can also leverage KMS, which
is a customer-provided key management system that
will manage the keys for the encrypted SS tables.
And that can be deployed on SILA Cloud
and could also be deployed on SILA Enterprise.
So as time goes by, we are usually
aiming towards covering and minimizing the gap
so our users have a much higher value of looking
to migrate into cloud rather than managing SILA
on by themselves, which can be a pretty complex task.
Yeah, indeed.
So last time we caught up with with Door, actually in the two of the previous times
that we caught up, so the last one and the one right before it, CLA-DB Cloud was a major conversation point
because each of those times, Door kept reporting year-on-year growth on the CLA-DB usage and also on the CLA-DB cloud usage and also on the revenue.
So I'm wondering if you are aware of whether that growth has been going on basically.
I don't have access to exact financial numbers, but definitely.
I wasn't so much talking about that but more about usage.
No, yeah, no, definitely we are growing. I mean we see that option of CWDB growing
on a daily basis. It's actually really, really impressive. And as you can imagine, that's also
one of the reasons why we are, we spoke during our summit about many new features we are bringing.
Because during our summit, as we already spoke, we spoke about how we are going to implement vector search.
But beyond that, we also implemented about our tiered storage with support for S3 object storage.
Not just S3, right?
But when we refer to S3, we basically refer to any object storage for any cloud.
And those are the things we should really,
we will allow Celeradb,
also those features are in to also attract even more users
than what we are seeing today.
So I would say that,
Guy correct me if I'm wrong, I think on a weekly
basis I at least see a new SillyDB user coming in using our cloud. And yeah, I mean, the
growth has been impressive and the acceptance of our database is really, really, really,
really good. I mean, we have lots of very good logos showcased in our website.
We spoke about a few already like Trippadvisor, G Squared, we have Hulu, right?
Wish Others, they might be seeing you. We spoke about Udimo and that they're awesome.
Medium, digital, yeah, and so on.
I think I also recall seeing American Express as a user,
and yeah, a couple of, well, more than a couple of other big names as well.
But I just realized we're very close to wrapping up,
and you just mentioned the magic word, so new features.
And we've already kind of talked a little bit about vector.
And this is something that honestly I was expecting to see,
because I don't think there's like a single database today
which is not adding vector capabilities.
And for very obvious reasons.
You also mentioned tiered storage.
And again, that's something that in one way or another
I think has been ongoing for a while. I remember Dorr mentioning an effort towards
utilizing S3 for storage which I guess is still ongoing. So if you want to talk about those or
any other upcoming new features. So with regards to shared storage
specifically, it somehow ties back to what I previously said that users use
their databases in a variety of ways, right? When CillaDB was originally
conceived, again it was all about raw performance. We want to give you the
lowest latency possible, but sometimes raw performance. We want to give you the lowest latency possible.
But sometimes users, they don't want to, you know,
string their data to another database.
Sometimes their data grow too large
and the costs become a problem
because they have to scale out of their infrastructure
in order to accommodate the storage growth.
And many times users have data that they read very infrequently.
So the idea of cheer storage is actually
to give those users a choice.
Why are we still using CillaDB?
Of course, the performance is not going to be the same.
We cannot do miracles.
I mean, the speed, the latency of accessing object storage
is much higher than accessing a local NVMe.
But with supporting things like object storage,
there are many interesting things we could do.
Besides only cheering your storage by itself,
we could come up with use cases where, for example,
your local NVMes would only be used as a
fast cache and all your data could be stored on only on object storage. I
believe Doin already gave this example a few times back. When a node fails for
example instead of you having to stream data from the existing replicas you
could simply come and retrieve data directly from object storage and continue
servicing traffic from there. So S3 object storage in a way will allow Citadb to make
many more things than it's currently able to do today.
Okay, Kip, any new feature that you'd like to pick on?
I would personally be very willing to hear more about the vector indexing and vector storage.
Particularly because of the fact that some of the early benchmarks that I so mentioned seemed very promising. missing? Yeah, so I think the vector approach really leverages SILA's speed and throughput and low latency
combined with a very well-designed processing
and vector search layer.
So our teams are still developing the layer.
Those were very early numbers from benchmarks.
But still, I think that's a great sign that how CELUV can integrate really well with those
really complex scenarios which require extreme scalability because you can easily scale to
meters of operations per second just to to fulfill a couple of hundred thousand vector
inquiries.
And you can also leverage SILA's really extreme low latency
and predictable latency in order to fulfill those
in a really timely scenario.
So our aim is that for applications that really
need low latency and high throughput for those scenarios,
they can also continue to leverage SILA as a product
on the back end.
I would also highlight strongly consistent user data.
We talked about cluster topology changes and actual metadata
changes being strongly consistent on the refs, but we're
also working towards implementing that on the user layer. So this is extremely initial, so there's
no even a PLC to demonstrate, but that is an end goal that we have to really leverage the power of
refs as a distributed consensus algorithm to make sure that users have the best capabilities
in terms of designing the applications
on a strongly consistent manner while running Mozilla.
All right, so I think we talked about a number of things
actually and time flew by really, really fast.
And we can probably wrap up here unless there's any other,
whatever new feature or any use case or anything you feel that we've left out.
And now is the time to mention it.
Well, I just would like to say to our audience that if you're currently struggling with your
database, if your performance is not really well acceptable, you might want to check out
CillaDB.
Okay, cool.
That's a good message to close the conversation with.
So thank you.
Thank you both gentlemen for joining me.
It's been a pleasure.
Thanks for sticking around. For more stories like this, check the link in bio and follow Link Data Orchestration.