Orchestrate all the Things - Universal semantic layer: Going meta on data, functionality, governance, and semantics. Featuring Cube Co-founder Artyom Keydunov
Episode Date: May 16, 2024What is a universal semantic layer, and how is it different from a semantic layer? Are there actual semantics involved? Who uses that, how, and what for? When Cube Co-founder Artyom Keydunov sta...rted hacking away a Slack chatbot back in 2017, he probably didn't have answers to those questions. All he wanted to do was find a way to access data using a text interface, and Slack seemed like a good place to do that. Keydunov had plenty of time to experiment, validate, and develop Cube, as well as get insights along the way. We caught up and talked about all of the above, as well as Cube's latest features and open source core. Article published on Orchestrate all the Things.
Transcript
Discussion (0)
Καλώς ήρθατε στο Αρχιστήριο Όλων των Πράξεων.
Είμαι ο Γιώργος Ανατιώτης και θα συνεχίσουμε τα πράγματα μαζί.
Στοιχεία για τεχνολογία, δίδα, AI και ΜΕΔΙΑ και πώς μπλούν σε έναν άλλο, σύγχρονα με τις δύο.
Ποια είναι η Universal Semantic Layer και πώς είναι η διαφορά από τη Semantic Layer.
Είναι εξωτερικά συμβολή σε σημαντικές, ποιος τα χρησιμοποιεί, πώς και για ποιο.
Όταν ο κυβερνός του Cube, ο R.T.O.M. K.Dunov, ξεκίνησε να χάρτ. Dunov started hacking away a Slack chatbot back in 2017,
he probably didn't have answers to those questions. All he wanted to do was find a way to access data
using a text interface and Slack seemed like a good place to do that. K. Dunov had plenty of time
to experiment, validate and develop Kube, as well as get insights along the way.
We caught up and talked about all of the above, as well as Kube's latest features and open
source core.
I hope you will enjoy this.
If you like my work and orchestrate all the things, you can subscribe to my podcast, available
on all major platforms, my self-published newsletter, also syndicated on Substack,
HackerNin, Medium, and Dzone, or follow or gesturate
all the things on your social media of choice.
My name is Artem. I'm co-founder and CEO at Qube.
My background is in computer science, software engineering,
data engineering.
I started Qube as a side project actually back in 2017 or so.
And I started with the idea to build a Slack bot. That was like a first side project. let's just try to build a Slack bot before bots were kind of all hyped before LLMs.
And I built that Slack bot. And for myself, for my company, that Slack bot was able to get data
from different places, connect to different systems, organize them into a single data model
and push that kind of information into Slack. But by building that side project, I and my co-founder now, who joined quickly to work
on that stats bot, we built Cube.
And then we decided to open source Cube, and then we eventually built a company around
that.
So that's an interesting kind of story that Cube as a project really kind of been created as a side effect of another project, which is no longer around.
And I felt like it's a very cool story.
But yeah, we started Cube as an open source in 2019.
And then in 2020, we built a company around it.
And now I'm currently co-founder and CEO. Okay, yeah. Indeed, I wouldn't have guessed that what you do now, which is basically all around semantic layers, started out as a Slack bot.
And to be honest, I don't really see the connection. So if you wanted to build a bot, right, like, whether you would have a good LLM as we have right now, or back then, you know, like seven years ago, without LLM, regardless, they're like, if you want to have Slack, if you want your chatbot to be able to access data in warehouses or databases, you need to break it down into two steps. So one step would be how I go from natural language
to some multidimensional data model representation,
and then how I go from that sort of a data model to SQL.
So it's really almost impossible to go from text to SQL directly.
So you need to go from text to data model
and from data model to SQL, essentially.
So that's how Kube kind of been created
because when I was working in my co-founder,
we were working on that Slack bot.
We built this intermediate layer, which is Kube now.
So our bot was able to go from text to data model
and then from data model to the query in a database or data warehouse.
So, and now, even now, if you think about using it with LLM, that's what a lot of people
do right now with cube.
It's just kind of, you know, like we come in a full circle here, but people building
chatbots with cube right now.
And what they're doing is that they're taking all the context from a kube semantic layer and they use this context to generate correct SQL queries so that's essentially
why we started kube in the first place so that's a connection well yes it does make more sense now
because either I missed it or maybe you didn't mention it but when you talked about the slack
bots I missed the part when when you said what exactly the slack bot was supposed to do and i'm guessing based on on what
you just said from the context that the the purpose of a slack bot was to actually serve as a textual
interface to query i don't know a data warehouse or something like this right so now it does make
sense i can see the connection indeed.
Right. Yeah. And you're right. That's what this Slackbot was doing. So idea was that,
you know, like a lot of teams, they were using Slack, still using Slack, and they have a lot of different systems to access data. And the idea was, what if we can bring this data right
directly into the Slack, right? So you're still on a Slack, you do all this, you know,
like a conversation, discussion,
but what if you can just pull your data,
say from Salesforce or from, you know,
like some performance analytics
from like a new relic or some other system
right into Slack,
so you can just do a query, right,
right directly in Slack with natural language.
So that was an idea to,
and pull the data from different sources
and put it into Slack. So I think the key, well, at least in some ways, the key to all that is what you just
said. So data from different sources. As long as you're only creating a single data source,
that also implies you have like a single schema. So things are relatively straightforward.
It starts getting much more complicated as soon as you add data sources, because in all
likelihood, your two or more data sources will not have the same schema, or maybe they
don't have a schema at all.
And so you have to somehow start mapping things, basically.
And I guess that's what semantic layers are all about.
And so there's a number of things that we would like to cover in this conversation.
And also the fact, as I mentioned, that you have some news coming up, which we're going
to touch upon.
But I think it's best if we start from the actual definition of a semantic layer, because
I think for many people, it may
not be entirely clear. So how would you define what you do? So what constitutes a semantic layer?
I think we all used semantic layers in a way that I define it during, you know, like in our
sort of a professional work at some point when we use BI.
So many of us use some business intelligence tools at some point.
We kind of drag and drop chart measures dimensions onto the canvas to build chart.
Thinking about Tableau or some other BI tool.
So that is essentially semantic layer.
When we do something like a drag and dropping,
what BI tool does, it takes our metrics,
our measures from the UI,
and then it translates them into the query
to the underlying data source.
In many cases, it's a SQL data source.
Nowadays, it used to be sometimes something different, especially back then.
But today's semantic layer inside of BI tools,
they essentially take measures, take dimensions,
all of this high-level business objects,
revenue, active users, all these definitions,
and then translate them down to the underlying SQL queries.
And sometimes it's not one-to-one.
So sometimes you bring multiple measures, multiple dimensions,
and they can translate into multiple SQL queries
because you need to do more complicated calculations
to get to these metrics and then do some post-processing.
But at the end of the day, it's how we go from a business-level definition,
business-level metrics, into the database SQL queries.
And the semantic layer as an idea is not new.
It's been around for a while since the first days of the business intelligence.
In fact, business objects, they had a patent for sort of a semantic layer idea in the 90s.
And then MicroStrategy successfully depended against it.
And then the interesting problem happened and why Kube and idea of universal semantic
layer is relevant now is as we went from first wave of BI, where we had one big BI tool like
business objects and microstrategy being Houston inside organization. We went from that wave to this current wave where we have just explosion of
different data visualization and data consumption tools within organization.
So more data consumers, they wanted to access data in more self-service way.
There is like a whole idea of democratizing data, right?
So we want to give more tools to data consumers to like slice and dice differently data however they want.
But it also comes with a cost of having more tools.
And a cloud architecture made it really easier.
So it's like it's easier to buy new tools.
And there are like a lot of BI tools out there.
So what's happening is that in organization right now, you have like 10 BI tools and 20 BI tools, and every BI tool, it has a semantic layer. Now what's happening is sort of
kind of multiplied your semantic layers inside your organization. And the problem is that you
don't know which one is correct. You might have one BI tool with all the metrics, and the second
BI tool is all sort of the same metrics metrics and they are not going to be in sync
it's always in engineering when we repeat ourselves you know like the system is going to
be error prone it's going to be not in sync not consistent we even have this idea of dry right do
not repeat yourself so essentially what we're doing with this universal semantic layer now
is we're trying to apply dry principle to the data
architecture at scale so ideas a we say okay you have all these 10 bi tools they all have semantic
layers they all have definitions you probably don't want it to repeat yourself and build the
same metric across multiple systems so let's take out the definition put it in one place
where you can have it under the version control,
you can have all the best practices, how you want to manage your metric,
and then just use that metric across all of these BI tools.
And if you want to add more, 10 more BI tools, just add them.
You still have your metrics in a single place.
But that's an idea about Universal Cement Player.
In a way, it sounds like a new twist to an old problem.
I mean, you could be in pretty much the same situation if you had,
it sounds like a kind of meta layer in a way. So yes, you did have semantic layers ever since you had BI tools.
But now, as you're describing, because of the fact that you have many BI tools, you need to go one level up and define a meta-semantic layer to talk to all these different semantic layers.
So it's like integrating different databases with different schemas, in a way, just on a different level.
Right. Yeah, correct. the level right yeah correct i think that really the problem here comes with just a scale of the
of the stack as long if you have only one bi and you're going to use only one bi the entire life
right in organization you probably don't need an extra semantic layer what do you have inside that
bi is fine but that's realistically not going to happen, right? People are going to use Excel, Google Spreadsheets,
custom build applications, all of that.
So at this point, you really need a way to break,
make sure that your semantic layer can work
outside of your BI tool, right?
Because if it's only working within that BI tool
and Google Spreadsheet is not using the metrics
or Excel users are not using the metrics, defining that BI tool, Google Spreadsheet is not using the metrics or Excel users are not using
the metrics, defining that BI tool, that's really bad because you want that same metrics
to be consistent across the organization.
The difference with that, with database analogy, let's say, is that at least in the database
scenario, assuming you're talking about relational database, you do have a
standard that you can reuse. So you can always, if you have your queries, you can
always revert, let's say, to using ANSI SQL and you're pretty certain that your
query will work across your databases. But how do you ensure that, you know,
your definitions will actually work across your different BI tools?
Because do you have something like a SQL for metrics?
That's a great question.
Yeah, I think we call it and think about it as interoperability.
When you think about the semantic layer, the data modeling is an extremely important layer.
It's like how we help users to build the actual metrics but then the other important
metric layer is how people can actually use this metrics right interoperability so cube currently
we have several apis so one api is just a rest api and you you can sort of sister API to that is GraphQL.
So that's being usually used if you wanted to build your own application or to query data programmatically.
So you can just send a JSON structured query or GraphQL structured query
with a list of measures, dimensions, filters, and just retrieve the data back.
The other interface we have is a SQL.
So Q pretends to be a PostgreSQL database and C-compatible PostgreSQL database. The only caveat here is that tables
in this database, we call them useInternal in Kube, they are sort of a multidimensional.
It means that they're not real.
They're not materialized, right?
And then we actually materialize them on a fly when you query them.
So that's sort of like a caveat here.
But the system can generate the legit SQL.
It's possibly compatible.
We support all the different SQL functions.
And this way, tools like Tableau and other BI tools can connect to Kube.
And the other layer we support now is MDX.
That's for more like Excel, Power BI, Microsoft ecosystem,
because that's the most common query language in the Microsoft ecosystem.
And then we assume going to support DAX as well,
because DAX seems going to replace MDX eventually.
It's kind of Microsoft pushing more DAX right now rather than MDX.
So if I get it correctly, then those different ways,
let's say, of interacting with the underlying BI tools have to do with execution
time really. But how about the actual definition of metrics? So if I want to define something like,
I don't know, sales per quarter or whatever, do I have like a lingua franca that I can use to do that?
Yeah, the way we think about it is it's all code so the cube is one of the big philosophies
that cube is being a code first so your data model essentially is just a code base it's a set of yaml
files or python files so we use yaml for most of it because it's declarative, it's easier to define. So if you want to create measure, for example, sales per quarter, right?
You can just put a five lines of YAML as a definition, and then you're going to get that measure.
If you want to do more like a complex things, sometimes you can write a Python code to make more like a dynamic imperative things. But most of our customers, most of our users,
when they build data model in Kube,
they just use YAML definitions.
So that's YAML code base.
You put it under the version control.
You use something like a GitHub to manage the version control,
and then you just develop, add more definitions to it.
I think the way we think about developing data model is more like a software.
So if you wanted to add a new measure and new dimension, you would go into feature branch,
you know, like create a new method dimension, put it on staging environment, test things.
If you know, like if it makes sense, then you can merge it and then you can have a
pull request review process
and all of that. But at the end of the day, just a code base. Okay, that's an interesting idea. It
just makes me wonder, you said that the majority of your users work with YAML files. So it makes
me wonder, what are the profiles of the majority of your users because in my experience it would be more like the engineering
type of person that would be comfortable doing that not so much the analysts and in my mind at
least analysts would be like the number one audience for for these bi tools right yeah great
question yeah i would say most of our users who are working with data model
and keep directly their data engineers.
The way I think about it is that
if you have a data model semantic layer
in your organization, right,
that role is essentially sort of a data stewardship, right?
It's someone who is building data products, right?
Like some measures, dimensions, data models,
something that can be used downstream
to build visualizations.
And people who are usually building visualizations
in tools like Tableau or some other BI tools,
it's more like a data analyst, right?
Like people who consume data, data consumers.
So our primary users, I would say,
would be more for data stewards, data engineers, people who are building this model.
And they're usually more familiar with code-based tools.
Especially in recent years, I feel there is a big movement in the data world to apply software engineering best practices to data management and putting more things into code.
Now we see Airflow, you know, like very code-first driven,
which is a Python code.
Some other tools like a Daxter, Prefag,
really some transformation tools like SQL Mesh,
they're all like code-first as well.
So it's really like a big movement of making code-first workflows in data stack.
Yes, you're right about that. And in many ways,
I think there's this idea, this principle, let's say, of applying the same principles that have
worked well for software engineering also in data and data engineering. It's probably a good idea.
The hard part really, in my experience at least, is actually getting non-technical users to do that.
So, you know, people, first of all,
most of the time they have a bit of issue,
let's say a bit of trouble getting their heads around,
you know, the key concepts like pull requests
and pushing code and Git and all of those things.
But even if they do, then actually doing that, I don't know,
firing up an IDE or a command line and just doing those things,
I wouldn't really expect business users to be able to do that.
So it sounds like the type of user that is the key audience for your tool is
actually maybe doing a bit... They're also sitting in the middle in a way so they're
sort of taking the requirements from the business users and trying to translate them into code so
that they can work out that's that's correct i think that's where we are right now so cube is
all the workflow and cube is very engineering and technical driven.
I think that's a correct foundation, how to build infrastructure.
But I think you brought an interesting concern.
It's like, how do we make it more friendly or accessible for more non-technical people to be able to make changes to the data
model. I think historically, it's been a very big problem in data governance and the data modeling
in general is that it's relatively slow moving in a way that if you need to make a change to
your data model measure dimension, you need to go through multiple cycles before you can see a result. And if the workflow is technical, then a non-technical person cannot just do that.
They would need to go and submit a Jira ticket to the data engineering team and just kind
of wait for the next sprint so that ticket would go in and kind of be finished. I think what opportunity we have now is that with new large language models,
they're really good with generating code. We know that. And there are like specifically some models
that are really being built for code generation. GitHub Copilot works fine and a lot of users use it. So it's a living example of
how LLMs can generate code, which is correct and helpful. So I believe in the future, if you think
about it as a data model, it's just being a code. Why LLMs cannot generate that code in the future?
If we don't have a measure here, why LLM cannot just generate that? It could develop
and fix the data model or do something about it. And if we take these AI agents that can do changes
to the data model and pair them with domain experts that should not be technical, they may
be from finance, from a marketing, they're just data consumers, but they're domain experts.
And we pair them with these technical AI agents that can actually make changes to the code base.
Then the magic can happen.
Then we can iterate on the data model quite fast.
But at the same time, we keep the right foundation in place.
It's still a code base, meaning that we can apply all the software engineering best practices to it. We can have a pull request review system from a central governance data team.
We can do static analysis, impact analysis, CICD, all of that.
So I think that would be an interesting opportunity in future for us,
how to leverage LLMs to move faster with data models.
Interestingly, it sounds like by adding,
so initially you had, as you also pointed out, you had this chronic, let's say, situation in that
even the slightest change had to go through a long, long cycles of, you know, approval and
then implementation and all of that stuff. And the whole idea behind
specifically self-service BI tools was to do away with that and kind of make things simpler and
faster, having in mind precisely those business users and being able to serve them in that they
would in a way they wouldn't be dependent on the technical people to implement, you know, even the simplest of changes.
And now by adding this universal semantic layer, it sounds like this has sort of come in through the back door again.
So we're back in a situation where self-service BI won't cut it.
Yeah, it's always about the balance.
So on one side, you have a well-governed data,
trusted metrics.
On the other side, you have a flexibility and velocity
that you want to give to your data consumers.
So the question is,
how you can give them velocity and flexibility,
but at the same time still keep the correct governance.
Because what could go wrong if you give a lot of freedom
that people can start building wrong metrics,
and these metrics are going to spread across the organization,
which is wrong because someone built them.
And then that person may be not a data person at all.
They built them, they're wrong, that person quit a job
and then the metrics still remain there.
No one knows who built it, how it's working
but people have built layers on top of that
and now a lot of things are broken.
So that's a problem.
But the other problem is that if you put everything into the governance,
then you started to move very slow, right?
People cannot just do ad hoc analysis.
So I think what we need to do is
we need to give a good layer of data governance
that we can maintain,
the data team can maintain,
that's going to be like a gold layer
or platinum layer,
just a lot of reference to metals in a data industry.
But we can keep that governance layer in place.
But what we also need to support is ability to create ad hoc calculations
on top of governed metrics inside these BI tools.
So the analyst, the data consumer can go in a BI tool, Tableau, for example, right?
Load the data model from Kube into the Tableau just to start doing some analysis.
And then they see that the metric is missing.
And then they should be able to quickly build maybe LED calculation in Tableau to test something.
And that should be supported on top of these basic metrics.
Once it's built, then the question, okay, if it makes sense,
we would need to bring it back into the governance layer.
And that's where the workflow could happen.
It could be through the data team in future.
It could be through AI agent to make it faster.
But we still need to give a way for people to quickly build calculations
on the fly, on the edge, to test things because it's not possible to cover everything inside the
data model, inside the governed data model. Yeah, totally agree. I was also going to
add to what you said that, yes, you're right.
You do need to have some sort of process through which you govern your metrics,
because otherwise you're going to end up with chaos.
But I see that as kind of orthogonal to a universal semantic layer, actually.
Another question that I had, actually, as when you mentioned previously that
the main way through
which you define metrics is actually pretty simple so yaml files and I was wondering because
personally I come from a background of well more more elaborate let's say data modeling so knowledge
graphs and ontologies and kind of formal semantics and all of that stuff. And to me, when people talk about semantic layers,
always it kind of gets to a show.
Okay, is anyone using that kind of technology to an actual semantic layer?
Have you considered that?
Do you think it would make sense
to introduce something like that?
Or if not, why not?
Right.
Yeah, I think the name semantics about semantics
on a naming is always been a been a challenge even in our industry you know when we only started to
think about building what we call now in universal semantic layer we had a different names for that
type of technology some people were calling it metrics layer. Some people were calling it
a headless BI. And then eventually, as an industry, we all
arrived on a term, semantic models, semantic layer.
But at the same time, you're either knowledge graphs
and ontologists and all of that. So I would give you a definition and
try to scope what we mean
by semantic layer at Kube
and what other technologies in data space
specifically mean in analytics,
mean by semantic layer and by others,
it mostly would be like at scale, DBT,
like a Power BI semantic model.
So usually it's all about building measures, dimensions,
and a relationship between them.
So you would have some entities that hold your measures, dimensions,
and these entities are tables essentially, right?
It could be materialized tables in your warehouse.
It could be virtual tables.
These tables, they will hold measures, dimensions, and you will have a relationship between tables that sort of
create your data graph. And then you might have some slices of that data graph. So we call them
views at queue. Looker calls them explorers. Power BI calls them perspectives. But essentially,
just like a slices of this data graph
that you present to the end users.
So that's how we only define
semantic layer right now.
It's all for analytics purposes.
So it's not a general purpose
knowledge graph
from that perspective,
but it's enough to cover
analytics use cases.
Let's fast forward a little bit
because you actually already mentioned
something that hinted to some new features
that you're about to release.
And I had the chance to read a preview of those features
and it seems like at least a big part of it
is centered around using LLMs to do what precisely?
As again, a sort of natural language interface for people to define their metrics or a bit more than that?
Right. Yeah, I kind of mentioned already on one potential use of LLMs with semantic layer and the data modeling
and how LLM can generate a data model. So essentially, if you think about that use case,
eventually, I believe that LLMs will build and maintain semantic models over time.
That might not happen tomorrow, but it will happen over the course of the next few years.
That's one use case.
The other use case,
and that's what we're releasing this week,
is how we can leverage semantic layer context
and the knowledge to create better AI assistants
that can access data in our data warehouses.
So if we think about the problem from the engineering standpoint,
think it's a text-to-SQL problem, right?
So imagine you need to build an AI agent
that should be able to give you the correct answer
based on your data in your data warehouse.
Maybe it needs to run some calculation,
calculate the revenue per salesperson and compare that to the average revenue across all of them,
all the account executives. And to do that, an AI agent needs to generate a SQL code.
It's not a problem to generate a SQL code as to make it just NC SQL
because there are like so many examples
of NC SQL on the internet.
The problem is that your AI agent
doesn't know about your metrics.
It doesn't know about your data model.
If you just give it a table or a set of tables
and give a DDL of these tables as a context.
It's not going to really learn a lot.
It needs to get more prepared data on one hand.
So the metrics already defined,
but it also needs to get a more context about it.
What is that metric?
Why is it here?
How is it connected to that other metric?
How is that revenue connected to actually other metric, how that, you know, like your revenue is connected
to actually account executives.
What is account executive?
How it sits in an organization.
So all of this sort of, you know, like a knowledge
or knowledge graph, essentially all this relationship,
it needs to have that context.
That context can be represented as just a text
at the end of the day,
describing everything that you have in your data
model so you can take that text and you give it can give it to llm now your llm has all this context
and it can generate a correct sql query you also don't want that llm to generate query to your end
data warehouse because it's the query is still going to be big.
It might involve multiple joins, all this calculation.
So there is a lot of room to hallucinate and make a mistake.
And it would be hard to debug that.
You wanted that to make a query to your semantic layer.
So the query is going to be very simple.
It's going to be just kind of,
give me this metric with that dimension
and apply that filter.
Once you send it all to your semantic layer,
the semantic layer can apply the data model it has inside
and then generate a real big SQL
in more deterministic way, right?
And then execute that SQL
and send data all the way back to AI agent.
So that's what you're releasing
as a sort of first step toward that vision,
which is releasing an API endpoint
where you would be able to send your natural language query
saying like, hey, I want to look at my average,
compare my average, you know, like sales
with average sales per specific reps in that region.
And then the LLM can generate all these queries,
run them, execute them, and give you results back.
And then you can use it if you're building a chatbot,
you can use it directly,
or if you're building maybe like a more complicated AI agent,
you can use it as part of your chain, right?
So you can get that result based on this information,
apply some
other action and kind of keep it going. It's going to be just an API endpoint that can plug
into any chain, into any RAG architecture. Interesting. So, crucially in the process
that you mentioned, one crucial part is the context that you also mentioned, which you need to bring in so that
the LLM can have the right background information to actually translate that textual input to
something more structured to pass on to the semantic layer.
How do you actually feed that context to the LLM currently?
It depends.
If a context, if a data model is not huge, you can just take it and put it into prompt, right?
Like as is. If it's a larger context, you probably want to turn it into embedding beforehand
and then apply some RAG architecture where you would do like a vector search for the relevant context
based on the query you're getting in,
and then like apply only subset of that context.
So like some optimization, you can do that.
But it's just from an architectural perspective,
you know, like once we have a context,
once we able to do all this like metadata indexing,
understanding what relevant information we have,
we can turn it into the embeddings.
And then from there, we're just good to go,
which is applying best practices on building RIG architectures.
Okay, so it sounds like you're using a kind of mixed approach.
Let's say sometimes you use the context window.
Sometimes you use RUG, basically.
I have a tip here for you just to tie this back into the previous conversation
on knowledge graphs and so on. So actually, some people from the knowledge graph community did an
interesting experiment and tried to compare and contrast these two different approaches,
specifically on RUG. So in one instance, they do precisely what you are trying to do here. So basically feed context to LLMs in order
to access a back-end database. So on the one set of experiments they used the schema information
from the database as is basically for the LLM and in the other instance, in the other set of
experiments, they used the same information but morphed into a knowledge graph. And they compared the results in the two sets of experiments
and they found out that by using knowledge graph-based RAC,
they had much better results.
Yeah, I think I saw that paper you're referring to.
And I remember folks in our community,
what they did is that they took that
experiment and they run it instead of knowledge graphs they used cube semantic layer and i think
they got to even better results many things already sort of being pre-calculated in a cube
semantic layer so if you think about it going from like a DDL to knowledge draft, it's one
improvement. And then going even from knowledge draft to semantic layer, it's another improvement.
Interesting. I didn't know about that last bit. But yeah, I guess it just goes to show that
there's lots of... It's in the early stages, so there's lots of room for improvement, I guess.
Yeah, for sure.
Okay, and the other new feature that I saw you releasing has to do with chart prototyping.
So what's that exactly?
Yeah, so Kube is commonly used to build different data applications.
So we were talking a lot about using BI tools and LLMs, right? But sometimes
you just need to build your internal data app, maybe like this React or Python or some,
you know, like other visualization tool. And you just wanted to present it internally,
like show it to the customers. So we were releasing a way to quickly build prototype
different types of the views visualizations on
top of cube so we are still catalyst technology so we're not we're not providing you with a kind of
visualization layer but we give away to quickly iterate the different visualization options just
to to bring it to production sooner so that's been a very common request from many customers, how they can just quickly prototype charts on top of Kube APIs so they can build different apps. One kind of closing,
I guess, question, because we're almost kind of out of time, I had was, you did mention previously
that the core technology, at least, is open source. And now we are just talking about new features and so on. So I wonder which parts
are the core open source ones and which are the value-add features. And so if people want to get
started with Kube, where do they start and what can they start experimenting with?
Yeah, we have on a cloud, think about it as our cloud commercial product. It's just a more feature-rich platform.
All the different AI capabilities are built in a cloud platform.
Things like a chart prototyping built in a cloud platform.
There are more integration with a BI tool that's built in a cloud platform.
Integration with Excel built in a cloud platform, all of that.
But the core data modeling experience, you can write, you can create your data model
still in open source, right?
You can write this YAML files in open source.
You can compile them.
You can run them.
So there is like an open source core.
The easiest way to start is just to sign up at the kube.dev.
That's going to be a cloud product.
We have a free tier.
It's a very generous free tier in a way that you can build,
you can develop, you can build a staging
and even small production on that.
And that would be probably the best way really to start with Kube.
Well, thanks for the conversation.
It was a good, interesting one.
I learned a few things,
which is always what I like to walk away
from conversations with.
And I hope it was a good one for you as well.
And I guess we can wrap up right about here
unless there's anything that you feel
we didn't cover and we should.
No, that was a great conversation.
Thank you.
I really enjoyed it.
Thanks for sticking around.
For more stories like this,
check the link in bio
and follow Link Data Orchestration.